Neerinfo Solutions
AWS Data Engineer - ETL
Job Location
bangalore, India
Job Description
Key Responsibilities : AWS Data Infrastructure Development : - Design, develop, and maintain AWS-based data pipelines using services such as AWS Glue, AWS Redshift, Amazon S3, and AWS Lambda. - Build ETL (Extract, Transform, Load) processes, integrating batch and near real-time data from various sources to Amazon Redshift, S3, or other cloud-based storage solutions. - Develop data transformation scripts using Python and PySpark to process large datasets in the cloud. Big Data Technologies : - Work with Apache Spark and the Hadoop ecosystem to manage large-scale data processing workloads. - Optimize Apache Spark and Hadoop jobs for performance and scalability, ensuring data pipelines run efficiently at scale. SQL and Database Optimization : - Write and optimize complex SQL queries for data manipulation and aggregation in cloud data warehouses. - Experience in AWS Redshift for OLAP workloads, Hive for big data processing, or similar data warehouse solutions. Cloud and Data Security : - Implement security measures to ensure the protection and privacy of sensitive data within the AWS ecosystem, following industry best practices. - Work closely with data security teams to ensure compliance with data governance and regulatory requirements. Scheduling and Automation : - Experience with scheduling tools like Apache Airflow for workflow automation and pipeline orchestration. - Set up and maintain automated pipelines, monitor job performance, and manage job failures to ensure the continuity of data workflows. Documentation and Best Practices : - Ensure readable and maintainable documentation of data engineering components being developed for transparency, knowledge sharing, and onboarding. - Follow coding best practices and standards for Python and other technologies used in the development of data pipelines. Collaboration and Cross-Functional Teams : - Collaborate with data scientists, analysts, and other engineers to understand data requirements and deliver solutions that enhance business decision-making. - Participate in agile development processes, contributing to sprint planning and progress tracking. Required Qualifications and Skills : Experience : - 4 to 8 years of experience in data engineering, with significant hands-on experience in AWS services and data pipeline development. - Strong experience with AWS services including Redshift, Glue, EMR, Lambda, and S3. - In-depth experience with Apache Spark and Hadoop ecosystem for distributed data processing and analysis. - Proficiency in Python and PySpark for data engineering tasks, data transformation, and automation. Technical Expertise : - Strong understanding of SQL for data manipulation, performance tuning, and data retrieval from cloud-based data warehouses (AWS Redshift, Hive). - Expertise in designing and developing ETL pipelines (batch and near real-time) for integrating data across different systems and platforms. - Exposure to data scheduling tools like Apache Airflow for orchestrating and automating workflows. - Hands-on experience with data security practices and the ability to implement security measures to protect data in AWS. Cloud & Big Data Technologies : - Advanced experience with AWS data services (Redshift, Glue, S3, Lambda, EMR) for big data analytics and storage solutions. - Familiarity with Hadoop, Spark, and other big data processing frameworks. Tools & Frameworks : - Knowledge of version control systems like Git for managing codebases and collaborating with teams. - Experience working in agile development environments, contributing to sprint planning and continuous & Troubleshooting : - Excellent problem-solving and debugging skills, with the ability to resolve data pipeline issues and improve system performance. - Ability to troubleshoot and resolve issues related to data pipelines, databases, and AWS infrastructure. Soft Skills : - Strong communication and collaboration skills, with the ability to work effectively in cross-functional teams. - Ability to document and communicate technical requirements and solutions clearly and concisely. Preferred Skills : - Experience with data lake architecture and integration of data across multiple sources. - Certifications in AWS (e.g, AWS Certified Big Data - Specialty, AWS Certified Solutions Architect). - Experience with containerization technologies such as Docker and orchestration tools like Kubernetes. - Familiarity with Data Governance and Data Quality practices. Education : - Bachelor's or Master's degree in Computer Science, Information Technology, Data Engineering, or a related field (ref:hirist.tech)
Location: bangalore, IN
Posted Date: 1/19/2025
Location: bangalore, IN
Posted Date: 1/19/2025
Contact Information
Contact | Human Resources Neerinfo Solutions |
---|