Sr Data Engineer

Apply to this position

Job Title: Sr Data Engineer
Location: Bengaluru, International/Other

Company: Archer Daniels Midland Company (ADM)
Industry Sector: Agribusiness
Industry Type: Input Retail, Cooperative and Related Crop Services
Career Type: Engineering
Job Type: Full Time

Job Description: Job Description

Your Responsibilities

We are seeking a skilled and motivated Azure Databricks Sr Data Engineer to join our dynamic team. The ideal candidate will have strong experience with Python, Spark programming, and expertise in building and optimizing data pipelines in Azure Databricks. You will play a pivotal role in leveraging Databricks workflows, Databricks Asset Bundles, and CI/CD pipelines using GitHub to deliver high-performance data solutions. A solid understanding of Data Warehousing and Data Mart architecture in Databricks is critical for success in this role. If you're passionate about data engineering, cloud technologies, and scalable data architecture, we'd love to hear from you!

Python and Spark Programming:
- Develop and maintain scalable data pipelines using Python and Apache Spark within Azure Databricks.
- Write optimized, high-performance Spark jobs to process large volumes of data efficiently.
- Utilize PySpark for distributed data processing, transformation, and aggregation tasks.
Databricks Workflows:
- Design and implement Databricks Workflows to automate data pipeline execution, orchestrating complex workflows and batch jobs.
- Set up task dependencies, triggers, and notifications to ensure smooth and reliable execution.
- Monitor, troubleshoot, and optimize Databricks workflows for optimal performance and minimal failures.
Databricks Asset Bundles:
- Create and manage reusable components such as Databricks Asset Bundles, including notebooks, libraries, and models.
- Share and reuse asset bundles across teams to increase efficiency and ensure consistency in development.
CI/CD for Databricks Artifacts using GitHub:
- Implement CI/CD pipelines using GitHub Actions for the continuous integration and deployment of Databricks notebooks, jobs, and libraries.
- Automate the testing, building, and deployment processes to ensure smooth, consistent code delivery across environments.
- Collaborate with teams to implement version control practices and code reviews using GitHub.
Data Warehousing & Data Mart Design:
- Design and implement Data Warehousing and Data Mart solutions using Databricks, ensuring high-performance storage and retrieval of structured data.
- Integrate data from multiple sources into a central data warehouse using Spark-based transformations, ensuring efficient schema design and query performance.
- Implement dimensional modeling, including star and snowflake schemas, within Azure Databricks for data marts to support business intelligence and reporting.
Data Pipeline Optimization and Management:
- Continuously monitor and optimize Databricks-based data pipelines for performance, scalability, and cost efficiency.
- Implement best practices for data partitioning, caching, and query optimization within the Databricks platform.
- Troubleshoot and resolve issues related to data integrity, performance, and workflow execution.
Collaboration and Stakeholder Communication:
- Work closely with data scientists, analysts, and other teams to understand requirements and build data solutions that meet business needs.
- Communicate technical concepts effectively to both technical and non-technical stakeholders.
- Provide mentorship and guidance to junior data engineers on Databricks best practices, data architecture, and efficient coding techniques.

Your Profile

Python and Spark Programming:
- Minimum of 3 years of experience in Python programming, especially in data engineering, ETL processes, and distributed computing.
- Solid experience using Apache Spark (PySpark) for large-scale data processing and transformation within Databricks.
- Proficiency in writing and optimizing Spark-based jobs for high performance on large datasets.
Databricks Workflows:
- Strong hands-on experience with Databricks Workflows for orchestrating data pipelines and batch processes.
- Ability to design and optimize multi-step workflows with task dependencies, retries, and monitoring.
Databricks Asset Bundles:
- Experience in creating and managing Databricks Asset Bundles to promote reusability and modularization of notebooks, libraries, and models.
CI/CD for Databricks Artifacts using GitHub:
- Experience with implementing CI/CD pipelines for Databricks using GitHub and GitHub Actions for automating deployment of notebooks, jobs, and libraries.
- Expertise in version control practices and integrating Databricks with external Git repositories for collaborative development.
Data Warehousing & Data Mart Experience:
- Strong experience in designing and implementing Data Warehouses and Data Marts using Databricks and Spark.
- Understanding of dimensional modeling (star and snowflake schemas) and the ability to create optimized data structures for reporting and analytics.
- Hands-on experience integrating data from multiple sources and managing the ETL process within a Data Warehouse or Data Mart environment.
Cloud Experience:
- Solid experience working with the Azure ecosystem, including Azure Data Lake, Azure Blob Storage, and Azure SQL Database.
- Experience working in cloud environments and leveraging cloud-based tools for building and managing data pipelines.
Data Engineering Best Practices:
- Knowledge of best practices for designing and managing scalable, efficient, and cost-effective data pipelines.
- Experience in performance tuning and query optimization within Databricks and Spark.
Collaboration and Communication:
- Excellent teamwork and communication skills, with the ability to collaborate effectively across cross-functional teams.
- Ability to document technical processes and communicate progress and results to stakeholders.

Preferred Qualifications:

Cloud Certifications:
- Azure certifications, particularly in Databricks, Data Engineering, or Cloud Solutions, are a plus.
Big Data Technologies:
- Familiarity with other big data tools such as Kafka, Hadoop, or Flink for streaming and real-time data processing is a plus.
Data Science/ML Experience:
- Exposure to machine learning workflows and model management within Databricks (e.g., using MLflow) is beneficial.

To apply please click on APPLY TO THIS POSITION

Job Post Date: 05/16/25