MLOps Engineer at Codvo.ai in Pune, Maharashtra
Company Overview
At Codvo, software and people transformations go hand-in-hand. We are a global empathy-led technology services company. Product innovation and mature software engineering are part of our core DNA. Respect, Fairness, Growth, Agility, and Inclusiveness are the core values that we aspire to live by each day. We continue to expand our digital strategy, design, architecture, and product management capabilities to offer expertise, outside-the-box thinking, and measurable results.
Job Summary
As an MLOps Engineer, you will be instrumental in designing, deploying, and maintaining production-grade Machine Learning workflows across AWS and Azure. Leveraging container orchestration, Infrastructure as Code (IaC), and robust CI/CD pipelines, you will serve as a vital link between DevOps and ML teams. Your primary focus will be to automate critical aspects of the ML lifecycle, including model training, deployment, monitoring, and the creation of edge API services, thereby ensuring the delivery of reliable and scalable AI solutions.
Key Responsibilities
- Architect and implement comprehensive MLOps pipelines for ML model training, versioning, deployment, and monitoring using industry-leading tools such as MLflow, Kubeflow, SageMaker Pipelines, or Azure ML.
- Develop and manage Infrastructure as Code (IaC) solutions using Terraform to provision and manage resources across multi-cloud environments, including AWS EKS/ECS/Lambda and Azure AKS/Azure Functions.
- Design and build containerized applications using Docker and orchestrate them effectively on Kubernetes clusters (EKS/AKS) to support high-availability ML inference and edge services.
- Develop and maintain CI/CD pipelines employing Azure DevOps (ADO), GitHub Actions, AWS CodePipeline, or Azure Pipelines to automate the deployment of Python/FastAPI microservices and Node.js backends.
- Create and optimize edge API applications, such as FastAPI-based services, to ensure low-latency inference performance on platforms like AWS Lambda@Edge, Azure Functions, or ECS Fargate.
- Implement comprehensive observability solutions utilizing tools like Prometheus, Grafana, CloudWatch, and Azure Monitor, including setting up alerts for ML model drift, performance anomalies, and infrastructure health.
- Collaborate closely with data scientists and DevOps teams to successfully productionize AI solutions, troubleshoot complex issues, and scale systems to handle high workloads.
- Write clean, efficient, and production-ready code in Python, Node.js, and Bash for automation scripts, ETL processes, and API gateways.
Required Qualifications
- Bachelor's degree in Computer Science, Engineering, or a closely related field.
- A minimum of 4 years of hands-on experience in DevOps or MLOps roles, with a demonstrable track record of successful deployments on both AWS and Azure cloud platforms.
- Demonstrated expertise in:
- Cloud Platforms: AWS (specifically EKS, ECS, Lambda, SageMaker, ECR) and Azure (specifically AKS, Azure ML, Functions).
- Infrastructure as Code & Orchestration: Proficient with Terraform, Docker, and Kubernetes (EKS/AKS).
- CI/CD Pipelines: Experience with Azure DevOps (ADO), Jenkins, GitLab CI, or native AWS/Azure pipeline tools.
- Programming Languages: Strong proficiency in Python (including frameworks like FastAPI, Pandas, Scikit-learn) and Node.js.
- MLOps Practices: Solid understanding and practical experience in model deployment, versioning, and monitoring using tools like Seldon or KServe.
- Hands-on experience in building and deploying edge services and API applications designed for real-time inference.
- Strong problem-solving capabilities, particularly within complex multi-cloud environments.
Preferred Skills
- Relevant certifications such as AWS Certified Machine Learning – Specialty, Azure AI Engineer Associate, Certified Kubernetes Administrator (CKA)/Certified Kubernetes Application Developer (CKAD), or Terraform Associate.
- Experience with vector databases (e.g., Pinecone, FAISS), serverless ML deployments, or Generative AI fine-tuning.
- Familiarity with React.js for developing dashboards to visualize ML metrics.