What is the salary for this Site Reliability Engineer position?

Salary information for this Site Reliability Engineer position is available upon application.

What experience is required for this Site Reliability Engineer role?

This Site Reliability Engineer position requires mid_level of experience.

Where is this Site Reliability Engineer job located?

This Site Reliability Engineer position is located in Bengaluru, Pune.

How do I apply for this Site Reliability Engineer position at Red Hat?

You can apply for this Site Reliability Engineer position by clicking the 'Apply Now' button on this page, which will direct you to the official application portal.

Site Reliability Engineer at Red Hat | Bengaluru, Pune | Apply Now | MindMyJob

About the Role

Red Hat is looking for a passionate Site Reliability Engineer (SRE) to join our dynamic team, focusing on maintaining highly reliable cloud-based services. In this impactful position, you will be instrumental in supporting Red Hat's software manufacturing services on our hybrid cloud infrastructure. You will collaborate closely with development, quality engineering, and release engineering colleagues to ensure the health and performance of the infrastructure that hosts these critical services.

Your daily responsibilities will include creating and maintaining robust service monitoring systems, enhancing automation processes, upholding stringent security best practices, and effectively responding to various service situations. You will actively participate in communities of practice to influence and coordinate the design and evolution of our hybrid cloud platform. A key aspect of this role involves co-responsibility for defining Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for the services your team supports, and executing remediation plans when SLOs are not met.

We expect you to respond promptly during service outages and engage in continuous learning initiatives to bolster the resilience of our services. Join us and become part of a team dedicated to Red Hat's mission of producing world-class open-source software.

Key Responsibilities:

Contribute to a globally distributed team, providing 24x7 support through a service model optimized by leveraging different time zones for extended coverage, including regular on-call rotations.
Resolve service incidents by adhering to established operating procedures, investigating outage root causes, and coordinating incident resolution across various service teams.
Participate in incident retrospective reviews and implement corrective actions.
Configure and maintain essential service infrastructure.
Proactively identify and eliminate toil by automating manual, repetitive, and error-prone processes.
Coordinate efforts with other Red Hat teams such as IT Platforms, Infrastructure, Storage, and Network to ensure our services' cloud deployments meet high-quality expectations.
Implement comprehensive monitoring, alerting, and escalation plans to address infrastructure outages or performance issues.
Collaborate with service owners to define and implement SLIs and SLOs for supported services, ensure these targets are met, and execute remediation plans as needed.

What You Will Bring:

Proven experience with OpenShift administration.
Strong Linux administration expertise.
General knowledge of AWS technologies.
Experience with CI/CD platforms such as Tekton and Pipelines as code, with optional experience in GitHub Actions or Jenkins.
Hands-on experience with automation tools like Ansible or Terraform.
Familiarity with open-source monitoring technologies including Grafana, Prometheus, and OpenTelemetry.
Excellent written and verbal communication skills in English, essential for effective collaboration within a globally distributed team.

Additional Qualifications (Plus):

Previous experience within an SRE model.
Experience with software development using Python or GoLang.

Site Reliability Engineer

Auto Apply to 50+ AI Matched Site Reliability Engineer Jobs

Full Job Description

Company

Red Hat