Paddlelift

Data Engineer - Python/PySpark

Job Location

in, India

Job Description

Job Description : As a Data Engineer, you will play a critical role in the data lifecycle, from acquisition and storage to processing and delivery. You will design, develop, and implement scalable data solutions, ensuring data quality, performance, and security. Your expertise in SQL, MongoDB, Python, and ideally big data technologies like Kafka and PySpark will be essential in building and optimizing our data : - Design, build, and maintain scalable and efficient data pipelines using various technologies. - Extract, transform, and load (ETL) data from diverse sources into our data warehouse and data lake. - Ensure data quality, integrity, and consistency throughout the data pipelines. - Work extensively with SQL databases (e.g., MySQL, PostgreSQL) for data storage and retrieval. - Utilize NoSQL databases like MongoDB for flexible data modeling and storage. - Optimize database performance and ensure data security. - Leverage knowledge of Kafka for real-time data streaming and processing. - Utilize PySpark for large-scale data processing and analysis on distributed systems. - Write clean, efficient, and well-documented code in Python for data manipulation, automation, and pipeline development. - Collaborate closely with data scientists and analysts to understand their data requirements and provide them with clean and well-structured data. - Assist in data exploration and preparation for analytical modeling. - Implement data quality checks and monitoring processes. - Adhere to data governance policies and standards. - Identify and resolve performance bottlenecks in data pipelines and database systems. - Optimize data processing workflows for efficiency and scalability. Documentation : - Create and maintain technical documentation for data pipelines, data models, and ETL processes. Requirements : Technical Skills : - Strong proficiency in SQL for querying and manipulating relational databases. - Hands-on experience with MongoDB or other NoSQL databases. - Excellent programming skills in Python for data engineering tasks. - Good knowledge of statistics and its application in data analysis. Experience : - 2-3 years of relevant work experience as a Data Engineer. Education : - A bachelor's degree in a relevant field, such as computer science, statistics, mathematics, or engineering from a Tier-1 Institute. (ref:hirist.tech)

Location: in, IN

Posted Date: 5/1/2025

View More Paddlelift Jobs

Contact Information

Contact	Human Resources Paddlelift