Design, develop, and maintain ETL pipelines using Cloudera tools such as Apache NiFi, Apache Flume, and Apache Spark.
Create and maintain comprehensive documentation for data pipelines, configurations, and processes.
Data Integration and Processing
Integrate and process data from diverse sources including relational databases, NoSQL databases, and external APIs.
Performance Optimization
Optimize performance and scalability of Hadoop components (HDFS, YARN, MapReduce, Hive, Spark) to ensure efficient data processing.
Identify and resolve issues related to data pipelines, system performance, and data integrity.
Data Quality and Transformation
Implement data quality checks and manage data transformation processes to ensure accuracy and consistency.
Must Have
Proficiency in Cloudera Data Platform (CDP) - Cloudera Data Engineering
Knowledge of data lakehouse architectures and their implementation
Hands-on experience with Apache Spark, Apache Airflow within the Cloudera ecosystem
Proficiency in languages such as Python, Java, Scala, Shell
Exposure to containerization and related technologies (e.g. Docker, Kubernetes)
System level understanding of Data structures, algorithms, distributed storage & compute
Good To Have
Experience with other CDP services like Dataflow, Stream Processing
Familiarity with cloud environments such as AWS, Azure, or Google Cloud Platform
Understanding of data governance and data quality principles
CCP Data Engineer Certified
Qualifications:
5+ years of experience in Cloudera/Hadoop/Big Data engineering or related roles
Proven track record of successful data lake implementations and pipeline development
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field
Qualities:
Can influence and implement change; demonstrates confidence, strength of conviction and sound decisions.
Believes in head-on dealing with a problem; approaches in logical and systematic manner; is persistent and patient; can independently tackle the problem, is not over-critical of the factors that led to a problem and is practical about it; follow up with developers on related issues.
Able to consult, write, and present persuasively.
Able to work in a self-organized and cross-functional team.
Able to iterate based on new information, peer reviews, and feedback.
Able to work seamlessly with clients across multiple geographies.
Research focused mindset.
Proficiency in English (read/write/speak) and communication over email.
Excellent analytical, presentation, reporting, documentation, and interactive skills.