
Data Engineer
Responsibilities
Qualifications & Requirements
Experience Level: Mid Level
Full Job Description
Data Engineer - Pune, India
Mahindra is seeking a skilled Data Engineer to join our Data Engineering & Infrastructure team in Pune, India. This critical role involves designing, building, and maintaining robust data pipelines and infrastructure on the Google Cloud Platform (GCP). Your work will ensure the reliable flow of data across the organization, empowering real-time analytics and data-driven decision-making. Without this expertise, the company would face significant challenges in efficiently processing, storing, and delivering data, leading to data silos, compromised data quality, and an inability to scale our data operations effectively.
Key Responsibilities and Deliverables
Data Pipeline Development
Design, develop, and maintain scalable ETL/ELT pipelines using GCP services such as Cloud Dataflow, Cloud Composer (Apache Airflow), and Cloud Functions. Build both real-time and batch data processing solutions capable of handling diverse data sources and formats.
Cloud Infrastructure Management
Architect and implement data infrastructure on Google Cloud Platform, leveraging services like BigQuery, Cloud Storage, Cloud SQL, Cloud Spanner, and Bigtable. Focus on optimizing the performance, cost, and reliability of these cloud resources.
Data Integration & Orchestration
Integrate data from a variety of sources, including APIs, databases, IoT devices, and third-party systems. Implement data orchestration workflows using Cloud Composer to ensure seamless data flow across different systems.
Data Quality & Governance
Implement robust data quality checks, validation rules, and monitoring systems. Ensure adherence to data governance policies and security standards by utilizing GCP security services like Cloud DLP and Cloud IAM.
Real-time Data Processing
Build streaming data pipelines using Cloud Pub/Sub, Cloud Dataflow, and BigQuery streaming inserts. Develop solutions that support real-time analytics and event-driven architectures.
Performance Optimization
Optimize query performance within BigQuery, employing partitioning and clustering strategies. Monitor and enhance pipeline performance through the use of Cloud Monitoring and Cloud Logging tools.
SAP Integration (Preferred)
Design and implement data integration solutions for SAP systems, including SAP ECC, S/4HANA, and BW/4HANA. Develop connectors and pipelines to extract data from SAP modules for advanced analytics and reporting.
Experience
We require 3-4 years of hands-on experience as a Data Engineer, with a strong emphasis on Google Cloud Platform. Proven experience in building and maintaining production-grade data pipelines and infrastructure on GCP is essential.
Qualifications
A Bachelor's or Master's degree in Statistics or Applied Statistics is required.
Primary Skill Requirements
Google Cloud Platform Expertise
- Advanced proficiency in BigQuery, including SQL, DML/DDL, and optimization techniques.
- Experience with Cloud Dataflow for both batch and streaming data processing.
- Hands-on experience with Cloud Composer/Apache Airflow for workflow orchestration.
- Implementation knowledge of Cloud Storage, Cloud SQL, Cloud Spanner, and Bigtable.
- Experience with Cloud Pub/Sub for building event-driven architectures.
- Familiarity with Cloud Functions and Cloud Run for serverless computing.
- Experience with Dataproc for managed Spark/Hadoop workloads.
Programming & Tools
- Strong programming skills in Python, Java, or Scala.
- Proficiency in both SQL and NoSQL databases.
- Experience with the Apache Beam SDK for data processing.
- Experience with Infrastructure as Code tools like Terraform or Cloud Deployment Manager.
- Proficiency in version control using Git and experience with CI/CD pipelines.
Data Engineering Concepts
- Deep understanding of ETL/ELT design patterns and best practices.
- Experience with data modeling techniques (dimensional, normalized, denormalized).
- Knowledge of data warehousing and data lake architectures.
- Familiarity with stream processing and real-time analytics concepts.
- Expertise in data partitioning, sharding, and optimization strategies.
Security & Governance
- Knowledge of GCP IAM, VPC, and security best practices.
- Experience implementing data encryption and privacy measures.
- Understanding of compliance frameworks such as GDPR and HIPAA.
Secondary Skill Requirements
SAP Knowledge (Preferred)
- Understanding of SAP architecture and data models.
- Experience with SAP HANA, BW/4HANA, or S/4HANA.
- Experience with SAP data extraction methods like ODP, BAPI, or RFC.
- Knowledge of SAP integration tools and connectors.
Additional Nice-to-Have Skills
- Experience with other cloud platforms (AWS, Azure).
- Knowledge of containerization technologies (Docker, Kubernetes/GKE).
- Understanding of Machine Learning/AI pipelines on GCP (Vertex AI, ML Engine).
- Experience with data visualization tools (Looker, Tableau, Data Studio).
Behavioral Competencies
- Strong problem-solving and analytical thinking skills.
- Excellent communication abilities, effective with both technical and non-technical stakeholders.
- Ability to collaborate effectively within cross-functional teams.
- Proactive approach to identifying and resolving data challenges.
- A continuous learning mindset to adapt to evolving cloud technologies.
- Meticulous attention to detail and a strong commitment to data quality.
- Capacity to manage multiple projects and prioritize tasks effectively.