
AI/ML Platform Engineer
Responsibilities
Qualifications & Requirements
Experience Level: Senior Level
Full Job Description
Welocalize is seeking a Senior Scalability Engineer in Noida, India, to design and optimize platforms that support the significant growth of AI/ML workloads. This role focuses on ensuring the scalability, reliability, and efficiency of AI/ML infrastructure while developing robust, high-performance systems. The successful candidate will collaborate with cross-functional teams to build resilient infrastructure and implement solutions for seamless model deployment, monitoring, and lifecycle management at scale.
Key Responsibilities include designing and implementing scalable solutions for AI/ML infrastructure to enable horizontal scaling, efficient resource utilization, and fault tolerance. The role involves applying best practices for platform stability, high availability, and disaster recovery, and building advanced observability frameworks using tools like Datadog for monitoring, logging, and tracing. Automation of infrastructure provisioning, deployment, and operational workflows is crucial. Collaboration with data science, product, and engineering teams is expected to align infrastructure with organizational goals. Cost optimization strategies for cloud resources and incident response, including post-mortems and root cause analyses, are also key. Continuous improvement by staying current with industry trends in cloud infrastructure, distributed systems, and observability is encouraged.
Qualifications include a Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field, with 5+ years of experience in AI/ML platform engineering, infrastructure, or operations. A proven track record in designing, scaling, and maintaining large, distributed systems is required. Technical expertise should include cloud infrastructure (AWS, GCP, Azure), infrastructure-as-code tools (Terraform, CloudFormation), and strong programming skills in Python and Node.js. Deep understanding of observability practices is essential. The candidate should demonstrate a proven ability to design scalable architectures, implement automated failover and disaster recovery, and optimize performance and resource utilization. Strong communication, collaboration, and problem-solving skills are necessary, along with experience in cloud-based cost management strategies.
Company
Welocalize
Welocalize is a global transformation partner that empowers brands to reach, engage, and grow international audiences. The company specializes in delivering multilingual content transformation service...