Senior AI/ML Engineer
Full Job Description
Optum is seeking a highly skilled Senior AI/ML Engineer to serve as a scale anchor for our AI/ML initiatives in Bengaluru, Karnataka. This role is critical in transforming AI/ML experiments into reliable, observable, and scalable production systems that meet real-world constraints. You will own deployment, monitoring, reliability, and scalability, ensuring our AI/ML solutions operate predictably, observably, and cost-effectively.
Primary Focus:
- Productionizing AI/ML systems, with an emphasis on the Generative AI stack, from experimentation through deployment.
- Designing and operating robust data pipelines and learning workflows for AI/ML systems.
- Deploying models across various patterns including batch, streaming, and real-time serving.
- Managing model artifacts, registries, and feature/representation stores.
- Monitoring model performance, detecting data drift, and managing retraining workflows.
- Establishing and maintaining Dev, Stage, and Production environments specifically for AI/ML systems.
Primary Responsibilities:
- Design and operate highly reliable and available scalable AI/ML systems.
- Build and maintain CI/CD pipelines tailored for AI/ML workflows, encompassing code, data, and models.
- Implement comprehensive data validation, quality checks, and freshness guarantees.
- Monitor system health, data drift, and model drift, establishing effective alerting and remediation strategies.
- Optimize AI/ML systems for cost-efficiency, low latency, and operational excellence.
- Provide post-deployment support, lead incident response, and drive continuous improvement efforts.
- Adhere strictly to employment contract terms, company policies, and directives, including those related to work location, team changes, work shifts, benefits flexibility, and work environment adjustments in response to evolving business needs. The company reserves the right to modify or rescind policies and directives at its sole discretion.
Required Qualifications:
- Undergraduate degree or equivalent practical experience.
- Core Requirements:
- Proven experience building and operating production systems under real-world constraints.
- Strong software engineering fundamentals with a deep understanding of AI/ML lifecycles.
- Excellent collaboration skills, particularly when working with AI/ML engineers and platform teams.
- Demonstrated ability to design systems that operate reliably within defined SLAs and failure modes.
- Technical Skills:
- Programming & Engineering: Proficiency in Python, with solid software design and debugging capabilities.
- AI/ML Systems: Experience with model deployment patterns (batch, streaming, online), and RAG pipelines.
- Data & Infrastructure: Familiarity with data pipelines, distributed systems, and Vector & No-SQL databases.
- Platforms & Tooling: Experience with cloud platforms, containerization (Docker/Podman), Kubernetes, orchestration tools, and MLflow.
- CI/CD & AI/MLOps: Expertise in CI/CD pipelines, GitHub Actions workflows, workflow orchestration, and model/version management.
- Observability: Skilled in implementing logging, metrics, and alerts for AI/ML systems in production environments.
Preferred Qualifications:
- Experience with Python, Kubernetes, Docker, Splunk, MongoDB, Grafana, MLflow, Lang Graph, Azure AI/ML Studio, Jfrog, and N8n.
At UnitedHealth Group, our mission is to empower healthier lives and create a more effective health system for all. We are committed to dismantling barriers to good health, especially for communities disproportionately affected by health disparities. Our efforts to mitigate environmental impact and deliver equitable care are central to our mission and enterprise priorities.
#NIC
Company
Optum
Optum is a global leader in health services and innovation, dedicated to improving health outcomes for millions worldwide through technology-driven solutions. We connect individuals with the essential...