Senior Software Engineer
Responsibilities
Qualifications & Requirements
Experience Level: Senior Level
Full Job Description
AION is seeking a Senior Software Engineer to join its Inference Platform team in Bengaluru, Karnataka, India. You will be instrumental in building and scaling high-performance inference systems for AI/ML workloads, addressing complexities in serving models at scale, including latency optimization, resource orchestration, autoscaling, and production reliability. Experience designing distributed systems that handle thousands of requests per second with sub-second response times and cost efficiency is crucial.
We are looking for candidates with a strong understanding of inference at scale. Experience with Golang is highly preferred, with bonus points for familiarity with inference engines (vLLM, TGI, TensorRT), containerization, and distributed systems. You should be comfortable taking ownership of platform-level decisions, strategically balancing performance and cost, and contributing to a platform used by thousands of developers globally.
As a product-minded engineer, you will understand the impact of your technical decisions on the end-user experience. This role requires a team player comfortable with diverse responsibilities, from optimizing inference latency and managing infrastructure to engaging with customers to understand their challenges and contributing to UI/UX, customer success, documentation, and product operations.
Responsibilities
Inference Platform Architecture & Core Services
- Design and build AION's inference service platform, the core for large-scale AI model serving.
- Architect and own key platform components: AI Gateway, Resource Orchestrator, Runtime Engines, and Autoscaler.
- Develop highly modular, scalable, and extensible low-level designs for inference infrastructure.
- Lead high-level design discussions, establish architectural patterns, and drive technical decisions for the inference stack.
Model Deployment & Lifecycle Management
- Optimize model deployment, version upgrades, and rollback strategies.
- Build robust pipelines for zero-downtime model updates.
- Design intelligent routing for multi-model serving, A/B testing, and canary deployments.
- Implement efficient GPU utilization and model cold-start optimization strategies.
Performance & Distributed Systems
- Develop highly performant software for low-latency, high-throughput inference serving.
- Build and debug production-grade distributed systems for real-time AI workloads.
- Optimize inference pipelines for latency, throughput, batching, and resource utilization.
- Design fault-tolerant systems with graceful degradation and auto-recovery.
Observability & Engineering Excellence
- Build a high-performance telemetry and observability stack for inference metrics, performance tracking, and debugging.
- Implement comprehensive monitoring for model latency, throughput, errors, GPU utilization, and cost.
- Conduct thorough code reviews to ensure code quality, performance, and architectural consistency.
- Establish engineering best practices for testing, documentation, and production readiness.
Requirements
- 4+ years of experience building and scaling backend systems, distributed platforms, or inference infrastructure.
- Strong understanding of AI/ML inference systems and experience with inference engines (vLLM, TGI, TensorRT-LLM, or similar).
- Deep knowledge of distributed systems design, microservices architecture, and API gateway patterns.
- Proficiency in Golang strongly preferred; Python, Rust, C++ for performance-critical components are a plus.
- Experience with container orchestration (Kubernetes, Docker) and infrastructure-as-code.
- Solid understanding of autoscaling strategies, load balancing, and resource scheduling algorithms.
- Experience building high-throughput, low-latency systems (sub-100ms response times).
- Familiarity with message queues (Kafka, RabbitMQ), databases (PostgreSQL, Redis), and event-driven architectures.
- Knowledge of GPU computing, model serving optimizations (batching, quantization, multi-tenancy), and resource allocation.
- Experience with observability tools (Prometheus, Grafana, OpenTelemetry) and distributed tracing.
- Understanding of API design, rate limiting, authentication/authorization, and security best practices.
- Exposure to AI model deployment workflows and model lifecycle management is highly desirable.
Bonus / Good to Have
- HPC & Cluster Management: Experience with large-scale HPC clusters (Kubernetes, Slurm) for job scheduling and resource orchestration.
- Data Engineering: Expertise in data pipelines, ETL systems, and large-scale data processing frameworks.
- Systems-Level Programming: Experience with low-level systems programming (storage, Kubernetes operators, OS-level software, daemon services).
- ML Platform Engineering: Experience productionizing ML pipelines, batch job orchestration, model fine-tuning, and Jupyter notebook orchestration.
- Enterprise Deployment: Experience packaging software for on-premises or VPC deployments, focusing on security and compliance.
Preferred Attributes
- High ownership, self-driven, and a bias for action.
- Strong strategic thinking and ability to link technical decisions to business impact.
- Excellent communication and mentoring skills.
- Ability to thrive in ambiguous, fast-paced startup environments.
Why Join AION?
- Work directly with founders shaping technical and product strategy.
- Build infrastructure for the future of AI compute.
- Significant ownership and impact with competitive equity.
- Competitive compensation, flexible work options, and wellness benefits.
Apply Now:
If you are a strong engineer ready to lead architecture and scale next-generation AI infrastructure, we encourage you to apply. Please include:
- Your resume highlighting relevant projects and leadership experience.
- Links to products, code, or demos you have built.
- A brief note on why AION's mission excites you.
Company
AION
AION is developing a decentralized AI cloud platform designed to power high-performance computing (HPC). This platform aims to democratize compute access and offer managed services, functioning as an ...