ML Ops Platform Engineer
Full Job Description
About the Role
Join Capgemini Engineering, the global leader in engineering services, as an MLOps Platform Engineer. You will build robust platforms to accelerate Machine Learning innovation by automating the entire ML lifecycle—from training and hyperparameter optimization on GPU clusters to seamless deployment and continuous monitoring.
Key Responsibilities:
- CI/CD for ML: Develop pipelines using Jenkins, GitLab CI, or GitHub Actions; ensure reproducibility with version control (Git) and infrastructure as code (Terraform).
- Automated Training & Optimization: Design scalable workflows for data processing, model sweeps, and cost monitoring on cloud/on-premise resources.
- Model Serving & Deployment: Implement production strategies using Kubernetes, BentoML, or MLServer to ensure efficient serving at scale.
- MLOps Best Practices: Integrate platforms like Kubeflow, SageMaker, or Vertex AI; utilize tools such as MLflow and Comet ML for governance and tracking.
- Monitoring & Integration: Establish dashboards for model health/performance and ensure seamless integration with existing data pipelines (Databricks/PySpark).
Technical Requirements:
- Languages/Frameworks: Python, TensorFlow, PyTorch, Scikit-learn.
- Databases: SQL/PostgreSQL, NoSQL, and distributed training frameworks.
- Cloud & DevOps: AWS/Azure/GCP experience; proficiency in Docker/Kubernetes and IaC (Terraform).
Seniority Level
This is a solid mid-level to senior role requiring >5 years of relevant experience. Candidates should work with minimal supervision, organize their own time for medium-term horizons, and collaborate effectively within diverse global teams.
Company
Capgemini
Capgemini, a global business and technology transformation partner, leverages nearly 60 years of heritage to deliver tangible value through AI, cloud, data engineering, and industry expertise. With ov...