
TrueFoundry•11d ago
InstaHyre
Staff Engineer
Bangalore
Full Time
Senior Level
N/A
N/A
N/A
Qualifications & Requirements
Experience Level: Senior Level
Full Job Description
TrueFoundry is seeking an experienced Engineer passionate about scaling deep learning workloads, optimizing multi-GPU training, and delivering robust, production-grade solutions. This role offers the opportunity to tackle complex engineering challenges and significantly contribute to the evolution of our AI platform.
Responsibilities:
- Address and resolve some of the most intricate engineering problems, collaborating within a skilled engineering team.
- Develop a comprehensive, end-to-end understanding of the TrueFoundry platform and actively shape its product vision and implementation.
- Collaborate closely with our CTO and the engineering team to guide system design, architecture, and the development of complex products.
- Lead technical design initiatives, problem-solving for critical customer issues, and platform scalability efforts from inception to completion.
- Cultivate deep expertise across the entire TrueFoundry platform stack, including infrastructure, deployment systems, LLM/ML orchestration, observability, and cost optimization.
- Drive the system architecture and design for sophisticated, distributed, cloud-native systems.
- Lead and actively participate in design reviews, code reviews, and critical incident response processes.
- Work in close partnership with the CTO on architectural decisions, scaling strategies, and technical roadmap prioritization.
- Identify and spearhead efforts to reduce technical debt, enhance performance, and improve resilience across the platform.
- Apply a product engineering mindset, ensuring that customer needs and feedback are translated into scalable, effective engineering solutions.
Requirements:
- A minimum of 6 years of substantial backend/systems engineering experience, preferably gained at leading technology companies or innovative startups.
- Profound expertise in distributed systems, cloud-native architectures, and scalable system design principles.
- Strong practical knowledge of Kubernetes, containerized workloads, and infrastructure engineering practices.
- Hands-on experience in building or deploying ML/GenAI applications, or significant collaboration with ML/Data Science teams.
- Proficiency in programming languages such as Python, Go, or TypeScript.
- A solid grasp of system observability, resilient design patterns, and Site Reliability Engineering (SRE) methodologies.
- Exceptional technical leadership and communication skills, with the ability to engage effectively with both customers and engineering teams.
- Demonstrated ability to think strategically while also executing hands-on technical tasks as needed.
Company
TrueFoundry
TrueFoundry provides an Enterprise Platform as a Service (PaaS) designed for building, deploying, and governing Agentic AI applications securely and at scale. Our core offerings include an AI Gateway ...
Bangalore
Posted on InstaHyre