K
KLA1h ago
Foundit

Engineer

Chennai, India
Full Time
Mid Level

Auto Apply to 50+ AI Matched Engineer Jobs

Use Auto Apply Agents to Bulk Apply jobs with ATS Optimised Resumes, find verified Insider Connections for jobs at KLA

Full Job Description

KLA is seeking a highly skilled and motivated MLOps Site Reliability Engineer (SRE) to join our team in Chennai, India. This role is crucial for ensuring the reliability, scalability, and performance of our machine learning infrastructure, supporting our cutting-edge ML workflows.

Responsibilities:

  • Design, implement, and maintain scalable and reliable machine learning infrastructure.
  • Collaborate with data scientists and machine learning engineers to deploy and manage machine learning models in production environments.
  • Develop and maintain robust CI/CD pipelines tailored for machine learning workflows.
  • Monitor, analyze, and optimize the performance of machine learning systems and underlying infrastructure.
  • Implement and manage automated testing and validation processes for machine learning models.
  • Ensure the security and compliance of machine learning systems and associated data.
  • Troubleshoot and resolve complex issues related to machine learning infrastructure and workflows.
  • Document processes, procedures, and best practices for machine learning operations.
  • Continuously research and stay updated with the latest advancements in MLOps and related technologies.

Required Qualifications:

  • Bachelor's degree in Computer Science, Engineering, or a related technical field.
  • Demonstrated experience as a Site Reliability Engineer (SRE) or in a similar infrastructure-focused role.
  • Solid understanding of machine learning concepts, workflows, and the ML lifecycle.
  • Proficiency in programming languages such as Python, Java, or Go.
  • Experience working with major cloud platforms including AWS, Azure, or Google Cloud.
  • Familiarity with containerization technologies such as Docker and orchestration tools like Kubernetes.
  • Hands-on experience with CI/CD tools like Jenkins, GitLab CI, or CircleCI.
  • Strong analytical and problem-solving skills with a proven ability to troubleshoot complex technical issues.
  • Excellent communication and interpersonal skills for effective collaboration.

Preferred Qualifications:

  • Master's degree in Computer Science, Engineering, or a related field.
  • Experience with popular machine learning frameworks such as TensorFlow, PyTorch, or Scikit-learn.
  • Knowledge of data engineering principles and tools like Apache Spark, Apache Kafka, or Airflow.
  • Experience with monitoring and logging solutions such as Prometheus, Grafana, or the ELK stack.
  • Familiarity with Infrastructure as Code (IaC) tools including Terraform or Ansible.
  • Experience implementing automated testing frameworks specifically for machine learning models.
  • Understanding of security best practices pertinent to machine learning systems and data management.

This permanent position in Chennai offers an exciting opportunity to contribute to a leading-edge technology company and impact the success of our machine learning initiatives.

Company

K

KLA

KLA is a global leader providing advanced technologies and solutions for the semiconductor manufacturing ecosystem. Our innovations are essential for the production of virtually all electronic devices...

Chennai, India
Posted on Foundit