Neerinfo Solutions

AWS Data Engineer - ETL

Click Here to Apply

Job Location

bangalore, India

Job Description

Key Responsibilities : AWS Data Infrastructure Development : - Design, develop, and maintain AWS-based data pipelines using services such as AWS Glue, AWS Redshift, Amazon S3, and AWS Lambda. - Build ETL (Extract, Transform, Load) processes, integrating batch and near real-time data from various sources to Amazon Redshift, S3, or other cloud-based storage solutions. - Develop data transformation scripts using Python and PySpark to process large datasets in the cloud. Big Data Technologies : - Work with Apache Spark and the Hadoop ecosystem to manage large-scale data processing workloads. - Optimize Apache Spark and Hadoop jobs for performance and scalability, ensuring data pipelines run efficiently at scale. SQL and Database Optimization : - Write and optimize complex SQL queries for data manipulation and aggregation in cloud data warehouses. - Experience in AWS Redshift for OLAP workloads, Hive for big data processing, or similar data warehouse solutions. Cloud and Data Security : - Implement security measures to ensure the protection and privacy of sensitive data within the AWS ecosystem, following industry best practices. - Work closely with data security teams to ensure compliance with data governance and regulatory requirements. Scheduling and Automation : - Experience with scheduling tools like Apache Airflow for workflow automation and pipeline orchestration. - Set up and maintain automated pipelines, monitor job performance, and manage job failures to ensure the continuity of data workflows. Documentation and Best Practices : - Ensure readable and maintainable documentation of data engineering components being developed for transparency, knowledge sharing, and onboarding. - Follow coding best practices and standards for Python and other technologies used in the development of data pipelines. Collaboration and Cross-Functional Teams : - Collaborate with data scientists, analysts, and other engineers to understand data requirements and deliver solutions that enhance business decision-making. - Participate in agile development processes, contributing to sprint planning and progress tracking. Required Qualifications and Skills : Experience : - 4 to 8 years of experience in data engineering, with significant hands-on experience in AWS services and data pipeline development. - Strong experience with AWS services including Redshift, Glue, EMR, Lambda, and S3. - In-depth experience with Apache Spark and Hadoop ecosystem for distributed data processing and analysis. - Proficiency in Python and PySpark for data engineering tasks, data transformation, and automation. Technical Expertise : - Strong understanding of SQL for data manipulation, performance tuning, and data retrieval from cloud-based data warehouses (AWS Redshift, Hive). - Expertise in designing and developing ETL pipelines (batch and near real-time) for integrating data across different systems and platforms. - Exposure to data scheduling tools like Apache Airflow for orchestrating and automating workflows. - Hands-on experience with data security practices and the ability to implement security measures to protect data in AWS. Cloud & Big Data Technologies : - Advanced experience with AWS data services (Redshift, Glue, S3, Lambda, EMR) for big data analytics and storage solutions. - Familiarity with Hadoop, Spark, and other big data processing frameworks. Tools & Frameworks : - Knowledge of version control systems like Git for managing codebases and collaborating with teams. - Experience working in agile development environments, contributing to sprint planning and continuous & Troubleshooting : - Excellent problem-solving and debugging skills, with the ability to resolve data pipeline issues and improve system performance. - Ability to troubleshoot and resolve issues related to data pipelines, databases, and AWS infrastructure. Soft Skills : - Strong communication and collaboration skills, with the ability to work effectively in cross-functional teams. - Ability to document and communicate technical requirements and solutions clearly and concisely. Preferred Skills : - Experience with data lake architecture and integration of data across multiple sources. - Certifications in AWS (e.g, AWS Certified Big Data - Specialty, AWS Certified Solutions Architect). - Experience with containerization technologies such as Docker and orchestration tools like Kubernetes. - Familiarity with Data Governance and Data Quality practices. Education : - Bachelor's or Master's degree in Computer Science, Information Technology, Data Engineering, or a related field (ref:hirist.tech)

Location: bangalore, IN

Posted Date: 1/19/2025
Click Here to Apply
View More Neerinfo Solutions Jobs

Contact Information

Contact Human Resources
Neerinfo Solutions

Posted

January 19, 2025
UID: 4977389355

AboutJobs.com does not guarantee the validity or accuracy of the job information posted in this database. It is the job seeker's responsibility to independently review all posting companies, contracts and job offers.