DevOps Engineer
Responsibilities
Qualifications & Requirements
Experience Level: Mid Level
Full Job Description
Level AI is seeking a skilled DevOps Engineer for our Noida location. You will be instrumental in designing, building, and enhancing state-of-the-art machine learning system infrastructure, both on cloud and on-premise. Your role will involve architecting platforms for creating, training, and deploying ML models, and building operating dashboards to track system performance and errors, enabling root cause analysis.
You will identify gaps, evaluate tools, and leverage open-source and cloud technologies to improve processes and systems. Collaboration with our AI team to drive ML projects from conception to production monitoring will be key.
Responsibilities:
- Design, build, and develop/enhance state-of-the-art machine learning system infrastructure (cloud and on-premise) core components and architect platforms to create, train and deploy ML models.
- Build operating dashboards and charts to track system errors, performance and enable root cause analysis.
- Identify gaps and evaluate relevant tools and technologies as needed to improve processes and systems, leveraging open-source and cloud computing technologies to build effective solutions.
- Collaborate with the AI team to drive ML projects from conception to completion and production monitoring.
Requirements:
- Bachelor's or above with a strong academic background.
- 2-4 years of meaningful work experience in DevOps handling complex services.
- Strong troubleshooting skills to ensure high service availability.
- Expertise with Google Cloud Platform (GCP), Docker, Kubernetes, CI/CD, and Jenkins.
- Extensive experience in designing, implementing, and maintaining infrastructure as code, preferably using Terraform.
- Ability to create and maintain deployment manifest files for microservices using HELM.
- Experience with LLMOps or MLOps is a significant advantage.
- Strong expertise in deploying at scale on Kubernetes clusters via Horizontal Pod Autoscaler (HPA).
- Broad technical background with experience in architecture, design, and operations of cloud solutions, including meeting security compliance requirements.
- Experience in monitoring system health, ensuring security, scalability, and reliability.
- Proficiency in designing, implementing, and maintaining observability, monitoring, logging, and alerting using tools like Prometheus, Grafana, Promtail, Loki, and Datadog.
We offer market-leading compensation based on your skills and aptitude.
Company
Level AI
Level AI, headquartered in Mountain View, California, is a Series C Enterprise SaaS startup founded in 2019. We are revolutionizing customer engagement in contact centers by transforming them into str...