Talentonlease Pvt ltd

Big Data Developer - Apache Spark/Flink

Click Here to Apply

Job Location

in, India

Job Description

Key Responsibilities : Data System Development & Optimization : - Design, develop, and implement efficient data pipelines and data storage solutions using Big Data technologies. - Build robust, scalable, and high-performance data processing systems using distributed frameworks such as Apache Spark (Core, Streaming, SQL), Flink, or Storm. - Optimize and troubleshoot data processing jobs to ensure high throughput, low latency, and resource efficiency. - Work on ClickHouse or similar OLAP (Online Analytical Processing) systems for high-performance analytics on large datasets. Cloud Data Engineering : - Develop and manage data solutions using AWS Cloud services such as S3, Redshift, Glue, Kinesis, EMR, and other cloud-based data services. - Integrate cloud data solutions with on-premises data infrastructure, ensuring seamless data movement and access. - Implement and optimize cloud-based ETL (Extract, Transform, Load) processes in AWS. - Ensure data security, integrity, and compliance with industry standards in cloud environments. Programming & Development : - Utilize Java, Scala, and Python for developing data processing applications, algorithms, and custom solutions. - Build reusable code and libraries for future use, ensuring modularity and scalability of data solutions. - Write complex queries and data transformation logic to meet business and analytical needs. Collaboration & Solution Design : - Collaborate with cross-functional teams, including Data Scientists, Data Analysts, and Business Analysts, to understand requirements and design data architectures. - Work with the product and engineering teams to define and implement data solutions that are aligned with business goals. - Provide expertise in the integration of data systems with other business applications. Performance & Scalability : - Monitor and improve the performance of data systems and queries, ensuring the scalability of solutions to handle growing data volumes. - Conduct performance tuning and troubleshooting of large-scale distributed data processing jobs to optimize speed and cost-effectiveness. - Leverage cloud-native tools and frameworks to ensure efficient data storage, processing, and access. Documentation & Reporting : - Create and maintain detailed technical documentation for data systems, including architecture, processes, and workflows. - Report on the status of data processing projects, providing insights into data trends, system performance, and areas for improvement. Technical Qualifications : Core Skills : Big Data Technologies : - Proficient in Apache Spark (Core, Streaming, SQL), Flink, Storm, or other distributed data processing frameworks. - Strong knowledge of ClickHouse or other OLAP systems for large-scale data analytics and querying. - Experience in designing and building high-performance data pipelines. Cloud Data Services (AWS) : - Expertise in AWS services: S3, Redshift, EMR, Glue, Kinesis, Lambda, and CloudFormation. - Experience with cloud-based ETL processes and data storage/management in AWS. Programming Languages : - Expertise in Java, Scala, and Python for developing data-driven applications and algorithms. - Proficient in building distributed systems and data processing frameworks in these languages. Data Storage & Management : - Familiar with HDFS, NoSQL databases (e.g, Cassandra, HBase), and data warehouses like Redshift and BigQuery. - Knowledge of data lake architectures and cloud storage solutions. Data Integration & Transformation : - Skilled in ETL processes and frameworks for ingesting, transforming, and storing large volumes of data. - Proficient in SQL and NoSQL query languages for data retrieval and processing. Performance & Optimization : - Strong understanding of performance tuning and optimization techniques for large-scale distributed systems. - Experience with data partitioning, sharding, and optimizing big data storage and access. Version Control & Automation : - Familiar with Git and CI/CD pipelines for version control and automated deployment. Preferred Qualifications : - Experience with additional Big Data technologies such as Kafka, Hive, Presto, or Druid. - Familiarity with containerization tools like Docker and orchestration platforms like Kubernetes. - Experience working with Data Lakes and Machine Learning models in production environments. - Familiar with DevOps practices and cloud-native development tools. Education & Work Experience : - Bachelor's/Master's degree in Computer Science, Engineering, or a related technical field. - 8 years of IT experience, with at least 5 years in data-related technologies. - Minimum 3 years of experience in AWS Cloud services for data engineering. - Proven experience in working with distributed systems and Big Data technologies like Apache Spark, ClickHouse, or similar. Soft Skills : - Strong problem-solving skills and the ability to work under pressure. - Ability to work independently and as part of a collaborative team. - Strong communication skills, with the ability to explain complex technical concepts to non-technical stakeholders. - Highly organized, detail-oriented, and results-driven (ref:hirist.tech)

Location: in, IN

Posted Date: 1/11/2025
Click Here to Apply
View More Talentonlease Pvt ltd Jobs

Contact Information

Contact Human Resources
Talentonlease Pvt ltd

Posted

January 11, 2025
UID: 4991380944

AboutJobs.com does not guarantee the validity or accuracy of the job information posted in this database. It is the job seeker's responsibility to independently review all posting companies, contracts and job offers.