Red Hat
Red Hat2h ago
Naukri

Site Reliability Engineer

Bengaluru, Pune
Full Time
Mid Level

Auto Apply to 50+ AI Matched Site Reliability Engineer Jobs

Use Auto Apply Agents to Bulk Apply jobs with ATS Optimised Resumes, find verified Insider Connections for jobs at Red Hat

Full Job Description

About the Role

Red Hat is looking for a passionate Site Reliability Engineer (SRE) to join our dynamic team, focusing on maintaining highly reliable cloud-based services. In this impactful position, you will be instrumental in supporting Red Hat's software manufacturing services on our hybrid cloud infrastructure. You will collaborate closely with development, quality engineering, and release engineering colleagues to ensure the health and performance of the infrastructure that hosts these critical services.

Your daily responsibilities will include creating and maintaining robust service monitoring systems, enhancing automation processes, upholding stringent security best practices, and effectively responding to various service situations. You will actively participate in communities of practice to influence and coordinate the design and evolution of our hybrid cloud platform. A key aspect of this role involves co-responsibility for defining Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for the services your team supports, and executing remediation plans when SLOs are not met.

We expect you to respond promptly during service outages and engage in continuous learning initiatives to bolster the resilience of our services. Join us and become part of a team dedicated to Red Hat's mission of producing world-class open-source software.

Key Responsibilities:

  • Contribute to a globally distributed team, providing 24x7 support through a service model optimized by leveraging different time zones for extended coverage, including regular on-call rotations.
  • Resolve service incidents by adhering to established operating procedures, investigating outage root causes, and coordinating incident resolution across various service teams.
  • Participate in incident retrospective reviews and implement corrective actions.
  • Configure and maintain essential service infrastructure.
  • Proactively identify and eliminate toil by automating manual, repetitive, and error-prone processes.
  • Coordinate efforts with other Red Hat teams such as IT Platforms, Infrastructure, Storage, and Network to ensure our services' cloud deployments meet high-quality expectations.
  • Implement comprehensive monitoring, alerting, and escalation plans to address infrastructure outages or performance issues.
  • Collaborate with service owners to define and implement SLIs and SLOs for supported services, ensure these targets are met, and execute remediation plans as needed.

What You Will Bring:

  • Proven experience with OpenShift administration.
  • Strong Linux administration expertise.
  • General knowledge of AWS technologies.
  • Experience with CI/CD platforms such as Tekton and Pipelines as code, with optional experience in GitHub Actions or Jenkins.
  • Hands-on experience with automation tools like Ansible or Terraform.
  • Familiarity with open-source monitoring technologies including Grafana, Prometheus, and OpenTelemetry.
  • Excellent written and verbal communication skills in English, essential for effective collaboration within a globally distributed team.

Additional Qualifications (Plus):

  • Previous experience within an SRE model.
  • Experience with software development using Python or GoLang.

Company

Red Hat

Red Hat

Red Hat is a leading provider of open-source solutions, known for its commitment to innovation and collaboration in the technology sector. With a focus on enterprise-level software and hybrid cloud in...

Bengaluru, Pune
Posted on Naukri
Site Reliability Engineer at Red Hat | Bengaluru, Pune | Apply Now | MindMyJob | MindMyJob - AI Job Search Platform