Back End Engineer
Responsibilities
Qualifications & Requirements
Experience Level: Mid Level
Full Job Description
About Adani Group
Adani Group is a diversified Indian conglomerate comprising 10 publicly traded companies, renowned for its world-class logistics and utility infrastructure portfolio with a nationwide footprint. Headquartered in Ahmedabad, Gujarat, Adani Group excels in large-scale infrastructure development across India, adhering to global standards in operations and maintenance. It stands as the only Indian infrastructure company with four Investment Grade-rated businesses.
Job Purpose
As a Data Engineer for our AI Labs, you will be instrumental in designing, developing, and maintaining robust, scalable data pipelines and infrastructure to power advanced AI applications. This role demands expertise in implementing efficient data integration, processing, and transformation solutions, leveraging Python, PySpark, and leading cloud platforms like Azure and GCP. You will collaborate closely with AI, ML, and DevOps teams to ensure seamless data flow for AI/ML model training, deployment, and ongoing operations (MLOps), prioritizing optimized data architecture, security, and compliance.
Responsibilities
Data Pipeline Development & Optimization
- Design, implement, and optimize ETL/ELT data pipelines for AI and machine learning workloads to ensure efficient data processing.
- Enhance data flow and transformation processes using Python, PySpark, and cloud-based data engineering tools such as Azure Data Factory, Google Dataflow, and Databricks.
- Improve data ingestion and integration capabilities by utilizing Kafka, Pub/Sub, and other messaging queues for both real-time and batch processing.
- Implement distributed computing frameworks and optimize data storage architectures to guarantee scalability and high performance.
- Design and implement data lakes, data warehouses, and real-time streaming architectures on Azure and GCP to enhance AI data readiness.
- Structure, clean, and transform data to meet the specific needs of ML model training and inferencing, thereby optimizing AI model performance.
- Implement data governance, security policies, and access controls to ensure data accessibility for AI teams.
- Optimize big data storage and processing strategies to significantly reduce AI model training times.
- Implement CI/CD pipelines for ML model deployment following MLOps best practices to enable AI model lifecycle automation.
- Integrate Docker, Kubernetes, and cloud-based AI services to ensure seamless AI model serving.
- Utilize tools like MLflow or DVC for data tracking and experiment logging to enhance AI/ML data versioning.
- Set up real-time monitoring, logging, and alerting for AI/ML data pipelines to improve AI observability.
- Implement data encryption, masking, and anonymization techniques to ensure compliance with data privacy regulations such as GDPR and HIPAA.
- Enforce role-based access control (RBAC) and identity & access management (IAM) policies to strengthen data security.
- Implement data validation, schema enforcement, and audit logging mechanisms to ensure data integrity.
- Collaborate effectively with AI, DevOps, and business teams to align data infrastructure with evolving AI and analytics requirements.
- Evaluate and adopt emerging cloud, AI, and big data technologies to drive innovation in data engineering.
- Identify and implement best practices for automation, cost reduction, and performance tuning to optimize data engineering efficiency.
Key Stakeholders - Internal
- AI & Data Science Teams
- DevOps & Cloud Teams
- Business Intelligence & Analytics Teams
- IT Security & Compliance Teams
Key Stakeholders - External
- Cloud & Data Service Providers
- Third-party AI Model Vendors
- Regulatory Bodies & Compliance Authorities
Qualifications
Educational Qualification:
Bachelor's or Master's degree in Computer Science, Data Engineering, Information Technology, or a related field.
Certifications:
- Microsoft Azure Data Engineer Associate, Google Professional Data Engineer, or AWS Certified Data Analytics Specialty.
- Big Data & Apache Spark Certification (e.g., from Cloudera, Databricks, Coursera, Udemy).
- Certified Kubernetes Administrator (CKA) for data pipeline orchestration.
Work Experience:
1-10 years of experience in data engineering, cloud data platforms, and AI/ML data management.
- Expertise in data pipeline development, ETL/ELT processes, and cloud-based big data solutions.
- Hands-on experience with Python, PySpark, SQL, and cloud-native data services.
- Experience with AI/ML deployment, MLOps, and real-time data streaming architectures.
Company
Adani Enterprises Limited
Adani Enterprises Limited is a leading Indian company with a strong focus on engineering, power, and infrastructure development. The company operates a diverse portfolio of 10 publicly traded entities...