
Walmart Global Tech India•11d ago
InstaHyre
Senior Software Engineer
Bangalore
Full Time
Senior Level
N/A
N/A
N/A
Qualifications & Requirements
Experience Level: Senior Level
Full Job Description
Senior Software Engineer, AI/ML Development - Unified Observability Platform
Join Walmart Global Tech Bengaluru to build a sophisticated unified observability platform designed for 360-degree visibility across distributed systems. This platform aims to minimize instrumentation overhead, integrate seamlessly with existing environments, and leverage AI-driven insights for real-time issue detection, prediction, and resolution. Our vision is to create self-healing systems powered by AI agents that can autonomously diagnose and remediate issues, reducing the need for human intervention.
Responsibilities:
- Design, develop, and deploy scalable AI/ML models for anomaly detection, forecasting, and root-cause analysis.
- Build and optimize real-time inference APIs and services, integrating ML pipelines into production environments.
- Develop robust data pipelines for large-scale telemetry, logs, metrics, and traces using event-driven architectures.
- Automate the end-to-end ML lifecycle, including model training, evaluation, and deployment (MLOps).
- Continuously monitor and optimize model performance for accuracy, latency, and cost-efficiency.
- Collaborate closely with platform and SRE teams to integrate AI-powered automation and observability workflows.
- Build high-performance backend systems using Golang and modern design principles.
- Architect distributed, fault-tolerant systems with a strong foundation in concurrency, scalability, and resilience.
- Design multi-cloud applications utilizing Kubernetes, Docker, and infrastructure-as-code tools.
- Implement critical infrastructure components such as service discovery, load balancing, and failure recovery mechanisms.
- Contribute to CI/CD, observability, and automation frameworks for production systems.
- Design data flows using event streaming platforms like Kafka or Pub/Sub.
- Work with both SQL (PostgreSQL/MySQL) and NoSQL (MongoDB, Cassandra, ClickHouse) databases for managing structured and unstructured data.
- Implement efficient data serialization, compression, and query optimization strategies for large-scale datasets.
- Partner with SRE, DevOps, and Product teams to embed AI/ML capabilities within observability workflows.
- Produce clear design documents, architecture diagrams, and technical proposals.
- Contribute to the definition of long-term technical strategy and roadmap.
- Mentor junior engineers, fostering best practices in backend development, ML systems, and distributed computing.
Requirements:
- 5-10 years of overall software engineering experience, with 2-4 years specifically in AI/ML engineering.
- Demonstrated experience in deploying ML models across the entire lifecycle: data ingestion, training, inference, and monitoring.
- Strong coding proficiency in Golang, or Python with a commitment to learning Go.
- Bachelor's or Master's degree in Computer Science, Engineering, or a related technical field.
- Solid understanding of algorithms, data structures, and system design principles.
- Experience with ML frameworks such as TensorFlow, PyTorch, or Scikit-learn.
- Hands-on experience with time-series modeling, anomaly detection, or forecasting techniques.
- Exposure to Large Language Models (LLMs), Retrieval Augmented Generation (RAG) pipelines, or agentic workflows for automation.
- Familiarity with MLOps tools like Kubeflow, MLflow, Vertex AI, or SageMaker.
- Proficiency with distributed messaging systems such as Kafka, Pub/Sub, or similar.
- Hands-on experience with SQL/NoSQL databases and designing performant schemas at scale.
- Expertise in designing RESTful or gRPC APIs and scalable microservices.
- Strong emphasis on testing, CI/CD integration, and ensuring production readiness.
- Familiarity with observability stacks (e.g., Prometheus, Grafana, OpenTelemetry).
- Experience in real-time observability, AIOps, or incident management platforms is a plus.
- Knowledge of distributed consensus algorithms (Raft, Paxos) and event sourcing patterns.
- Contributions to open-source projects in ML, observability, or infrastructure domains are highly valued.
- Familiarity with LLM orchestration frameworks such as LangChain, Haystack, or Semantic Kernel.
Company
Walmart Global Tech India
Walmart Global Tech Bengaluru: Innovating at ScaleWalmart Global Tech Bengaluru is at the forefront of digital transformation, impacting millions of customers globally. With a reach of 260 million cus...
Bangalore
Posted on InstaHyre