Saviynt
Saviynt2h ago
Career Pages

Senior Site Reliability Engineer

Bengaluru
Full Time
Senior Level

Auto Apply to 50+ AI Matched Senior Site Reliability Engineer Jobs

Use Auto Apply Agents to Bulk Apply jobs with ATS Optimised Resumes, find verified Insider Connections for jobs at Saviynt

Responsibilities

Qualifications & Requirements

Experience Level: Senior Level

Full Job Description

Join Saviynt's SaaS Operations team as a Senior Site Reliability Engineer in Bengaluru. Our Monitoring and Alerting team blends operational excellence with development expertise to deliver highly available, resilient services at scale through automation and Infrastructure as Code. We embed reliability by utilizing best practices in Resiliency Engineering, Automation, Observability, and Chaos Testing.

We are seeking a Systems Thinking, Principal Engineer with a background in software or systems engineering, and a passion for scaling teams through production insights, operational automation, building observability programs, developer guidance, real-time metrics, and extensive automation.

As a Senior Site Reliability Engineer on our Product SRE Engineering team, reporting to the Senior Director of Site Reliability Engineering, you will be instrumental in:

  • Developing and maintaining infrastructure and tools to guarantee service reliability and improve customer experience.
  • Collaborating with teams to enhance observability, automation, deployment processes, and overall system reliability.
  • Designing, deploying, and managing scalable, robust infrastructure solutions that power global cloud services.
  • Partnering with product, operations, and security teams for seamless implementation of features, tools, and updates across the platform.
  • Creating and deploying AI-powered tools to increase operational efficiency and drive engineering excellence.

Minimum Qualifications:

  • Implement comprehensive observability for microservices and Kubernetes clusters using tools like OpenTelemetry.
  • Build and manage automation tools for streamlining deployment, patching, scaling, and infrastructure management.
  • Develop scalable portals for SRE dashboards, SLI/SLO/SLA tracking, error budgets, and executive metrics to support data-driven decisions.
  • Proficiency in programming and scripting languages such as Java, Python, Go, or Shell.
  • Experience with OpenStack cloud, Linux, Kafka, RabbitMQ, Prometheus, Terraform, Kubernetes, Ansible, MLOps, Generative AI, PostgreSQL, and analytics databases.
  • Familiarity with AWS solutions; Azure experience is also valued.
  • Experience with containerized workloads (Helm preferred; related: AKS & EKS, other K8s distributions, Docker, JFrog).
  • Proficiency in logging and monitoring tools (Prometheus, Grafana, Datadog preferred; related: ELK/OpenSearch, AWS Cloudwatch, Azure Monitor, Log Analytics, Fluentd).
  • Knowledge of Network Security concepts (e.g., AWS Policy, Azure Policy, VPN, Active Directory/RBAC, ACLs, NSG rules, private endpoints).
  • Demonstrated experience in implementing advanced observability practices and techniques at scale.
  • Hands-on experience with observability tools such as Prometheus, Grafana, ELK/OpenSearch, OpenTelemetry, or Datadog.

Preferred Qualifications:

  • Bachelor's degree in Computer Science or a related field, or equivalent experience, with 4+ years in Cloud-SRE, DevOps, or Systems Engineering.
  • Strong problem-solving abilities, excellent collaboration and communication skills, and a proactive approach to teamwork.
  • Familiarity with testing tools and frameworks.

Company

Saviynt

Saviynt

Saviynt is a leader in identity security, offering an AI-powered identity platform that governs access to applications, data, and business processes for human and non-human entities. Organizations tru...

Bengaluru
Posted on Career Pages