Mastech Digital
Databricks Engineer - PySpark/Informatica
Job Location
in, India
Job Description
Company : Mastech Digital Title : Databricks Engineer Location : 100% Remote (India) Work Timings : 6 PM - 3 AM IST (US Shift) Job Summary : Mastech Digital is seeking a skilled Databricks Engineer for a 6-month remote contract. The ideal candidate will have extensive experience with Apache Spark and Databricks, particularly in building scalable ETL pipelines using Delta Lake, PySpark, and Databricks SQL. This role also requires strong knowledge of Informatica PowerCenter and IICS, as well as experience with Azure cloud data engineering. The candidate must be proficient in SQL and Python/Scala for data transformations and job orchestration. Key Responsibilities : Databricks & Spark Development : - Design, develop, and maintain scalable ETL pipelines using Apache Spark and Databricks. - Utilize Delta Lake for building reliable and performant data lakes. - Develop and optimize data transformations using PySpark and Databricks SQL. Informatica Migration : - Understand and analyze existing ETL mappings and workflows in Informatica PowerCenter and IICS. - Migrate ETL processes from Informatica to Databricks, ensuring data integrity and performance. Azure Cloud Data Engineering : - Develop and manage data pipelines on Azure Databricks, Azure Data Lake Storage (ADLS), and Azure Synapse Analytics. - Optimize data storage and processing in the Azure cloud environment. SQL Optimization : - Write and optimize complex SQL queries for data transformations and analysis. - Debug and resolve performance bottlenecks in Databricks SQL and compare performance against Informatica mappings. Python/Scala Development : - Develop data transformation scripts and orchestrate jobs using Python (PySpark, Pandas) or Scala. - Implement custom business logic and data processing routines during migration. Performance Tuning : - Monitor and optimize Databricks performance to ensure efficient data processing. - Identify and resolve performance issues related to data pipelines and queries. Collaboration : - Collaborate with data engineers, data scientists, and other stakeholders to deliver data solutions. - Communicate effectively with team members and clients. Required Skills and Experience : Databricks & Spark : - Proficiency in Apache Spark and Databricks workflows. - Experience with Delta Lake, PySpark, and Databricks SQL. Informatica : - Strong understanding of Informatica PowerCenter and IICS. - Experience in migrating ETL processes from Informatica to Databricks. Cloud Data Engineering : - Experience with Azure Databricks, ADLS, and Synapse Analytics. SQL : - Advanced SQL skills for query optimization and data : - Proficiency in Python (PySpark, Pandas) or Scala for ETL development. Technical Skills : Databricks & Spark : - Apache Spark (Core, SQL, Streaming). - Databricks Runtime. - Delta Lake. - PySpark. - Databricks SQL. Informatica : - Informatica PowerCenter. - Informatica Intelligent Cloud Services (IICS). Cloud Platforms : - Azure Databricks. - Azure Data Lake Storage (ADLS). - Azure Synapse Analytics. Programming Languages : - Python (PySpark, Pandas). - Scala. Database : - SQL (Advanced). Data Engineering : - ETL/ELT Processes. - Data Warehousing. - Data Modeling. (ref:hirist.tech)
Location: in, IN
Posted Date: 5/9/2025
Location: in, IN
Posted Date: 5/9/2025
Contact Information
Contact | Human Resources Mastech Digital |
---|