Talentonlease Pvt ltd

Big Data Developer - Apache Spark/Flink

Job Location

in, India

Job Description

Key Responsibilities : Data System Development & Optimization : - Design, develop, and implement efficient data pipelines and data storage solutions using Big Data technologies. - Build robust, scalable, and high-performance data processing systems using distributed frameworks such as Apache Spark (Core, Streaming, SQL), Flink, or Storm. - Optimize and troubleshoot data processing jobs to ensure high throughput, low latency, and resource efficiency. - Work on ClickHouse or similar OLAP (Online Analytical Processing) systems for high-performance analytics on large datasets. Cloud Data Engineering : - Develop and manage data solutions using AWS Cloud services such as S3, Redshift, Glue, Kinesis, EMR, and other cloud-based data services. - Integrate cloud data solutions with on-premises data infrastructure, ensuring seamless data movement and access. - Implement and optimize cloud-based ETL (Extract, Transform, Load) processes in AWS. - Ensure data security, integrity, and compliance with industry standards in cloud environments. Programming & Development : - Utilize Java, Scala, and Python for developing data processing applications, algorithms, and custom solutions. - Build reusable code and libraries for future use, ensuring modularity and scalability of data solutions. - Write complex queries and data transformation logic to meet business and analytical needs. Collaboration & Solution Design : - Collaborate with cross-functional teams, including Data Scientists, Data Analysts, and Business Analysts, to understand requirements and design data architectures. - Work with the product and engineering teams to define and implement data solutions that are aligned with business goals. - Provide expertise in the integration of data systems with other business applications. Performance & Scalability : - Monitor and improve the performance of data systems and queries, ensuring the scalability of solutions to handle growing data volumes. - Conduct performance tuning and troubleshooting of large-scale distributed data processing jobs to optimize speed and cost-effectiveness. - Leverage cloud-native tools and frameworks to ensure efficient data storage, processing, and access. Documentation & Reporting : - Create and maintain detailed technical documentation for data systems, including architecture, processes, and workflows. - Report on the status of data processing projects, providing insights into data trends, system performance, and areas for improvement. Technical Qualifications : Core Skills : Big Data Technologies : - Proficient in Apache Spark (Core, Streaming, SQL), Flink, Storm, or other distributed data processing frameworks. - Strong knowledge of ClickHouse or other OLAP systems for large-scale data analytics and querying. - Experience in designing and building high-performance data pipelines. Cloud Data Services (AWS) : - Expertise in AWS services: S3, Redshift, EMR, Glue, Kinesis, Lambda, and CloudFormation. - Experience with cloud-based ETL processes and data storage/management in AWS. Programming Languages : - Expertise in Java, Scala, and Python for developing data-driven applications and algorithms. - Proficient in building distributed systems and data processing frameworks in these languages. Data Storage & Management : - Familiar with HDFS, NoSQL databases (e.g, Cassandra, HBase), and data warehouses like Redshift and BigQuery. - Knowledge of data lake architectures and cloud storage solutions. Data Integration & Transformation : - Skilled in ETL processes and frameworks for ingesting, transforming, and storing large volumes of data. - Proficient in SQL and NoSQL query languages for data retrieval and processing. Performance & Optimization : - Strong understanding of performance tuning and optimization techniques for large-scale distributed systems. - Experience with data partitioning, sharding, and optimizing big data storage and access. Version Control & Automation : - Familiar with Git and CI/CD pipelines for version control and automated deployment. Preferred Qualifications : - Experience with additional Big Data technologies such as Kafka, Hive, Presto, or Druid. - Familiarity with containerization tools like Docker and orchestration platforms like Kubernetes. - Experience working with Data Lakes and Machine Learning models in production environments. - Familiar with DevOps practices and cloud-native development tools. Education & Work Experience : - Bachelor's/Master's degree in Computer Science, Engineering, or a related technical field. - 8 years of IT experience, with at least 5 years in data-related technologies. - Minimum 3 years of experience in AWS Cloud services for data engineering. - Proven experience in working with distributed systems and Big Data technologies like Apache Spark, ClickHouse, or similar. Soft Skills : - Strong problem-solving skills and the ability to work under pressure. - Ability to work independently and as part of a collaborative team. - Strong communication skills, with the ability to explain complex technical concepts to non-technical stakeholders. - Highly organized, detail-oriented, and results-driven (ref:hirist.tech)

Location: in, IN

Posted Date: 1/11/2025

Click Here to Apply

View More Talentonlease Pvt ltd Jobs

Contact Information

Contact	Human Resources Talentonlease Pvt ltd