
IBM•2h ago
Naukri
Site Reliability Engineer
Bengaluru
Mid Level
N/A
N/A
N/A
Full Job Description
Site Reliability Engineer at IBM Bengaluru
IBM is seeking a talented Site Reliability Engineer to join our dynamic team in Bengaluru. This role is crucial for ensuring the high availability, resilience, and scalability of our cutting-edge IBM Quantum platforms and services.
Responsibilities:
- Lead incident response efforts, participate in critical war room activities, and drive comprehensive post-incident reviews and corrective actions to prevent recurrence.
- Collaborate closely with development teams to effectively debug, deploy, and maintain quantum workloads and backend services, ensuring seamless operation.
- Establish, refine, and rigorously maintain observability across all logs, metrics, traces, and alerting systems for proactive issue detection and resolution.
- Design and build innovative internal tools, robust automations, and efficient operational workflows to significantly improve team efficiency and minimize manual toil.
- Champion a culture of operational ownership, ensuring every quantum job runs reliably with complete traceability from inception to completion.
- Drive significant platform-wide improvements by leveraging operational insights, lessons learned from incidents, and established reliability patterns.
Required Qualifications:
- Bachelor's Degree.
- 2–5 years of proven professional experience as a Site Reliability Engineer.
- Strong systems-thinking ability to correlate complex data points including logs, traces, metrics, and code across distributed workloads.
- Hands-on experience with incident management, production operations, and on-call responsibilities in a demanding environment.
- Proficiency with modern observability tools such as Grafana, Sysdig, Jaeger, and similar solutions.
- Familiarity with container orchestration technologies like Kubernetes, deep understanding of Linux internals, and programming proficiency in Python or Go.
- Demonstrated ability to collaborate effectively across development, infrastructure, and platform teams.
- Proven ability to transform incident learnings into actionable automation, robust fixes, or significant architectural improvements.
- Solid understanding of SLI/SLO/SLA frameworks and key reliability metrics.
Preferred Qualifications:
- Experience with IBM Cloud services.
- Familiarity with Qiskit or foundational quantum computing concepts.
- Master's Degree.
Company
IBM
Bengaluru
Posted on Naukri