V
ValGenesis10d ago
Indeed

Senior Site Reliability Engineer

Chennai, Tamil Nadu
Full Time
Senior Level

Auto Apply to 50+ AI Matched Senior Site Reliability Engineer Jobs

Use Auto Apply Agents to Bulk Apply jobs with ATS Optimised Resumes, find verified Insider Connections for jobs at ValGenesis

Qualifications & Requirements

Experience Level: Senior Level

Full Job Description

ValGenesis is seeking a Senior Site Reliability Engineer (SRE) to join their SaaS Operations team in Chennai, Tamil Nadu. This role is crucial for ensuring the reliability and performance of ValGenesis's digital validation platform, serving the life sciences industry. The SRE will be responsible for defining and implementing SRE best practices, establishing and maintaining service level objectives (SLAs, SLIs, SLOs), and managing incident response. Key responsibilities include designing high-availability and disaster recovery strategies, automating manual processes, optimizing performance, and bridging the gap between development and IT operations. The engineer will also focus on strengthening system resiliency across Azure and on-premise deployments in a hybrid environment, ensuring strong tenant isolation and consistent performance within a DB-per-tenant architecture. The role requires leading incident response, conducting thorough root cause analysis (RCA) and blameless postmortems, and developing operational runbooks. A significant part of the role involves designing and maintaining a comprehensive observability framework using tools like Azure Monitor, Application Insights, Log Analytics, Prometheus, and Grafana.

Responsibilities:

  • Define and embed SRE best practices across the SaaS platform.
  • Establish and maintain meaningful SLA, SLIs, SLOs, and error budgets.
  • Design and continuously improve high-availability and disaster recovery strategies.
  • Automate manual processes, manage incident response, and optimize performance.
  • Bridge the gap between development and IT operations.
  • Ensure strong tenant isolation and consistent performance.
  • Strengthen system resiliency across Azure and on-prem deployments.
  • Lead incident response efforts with structured troubleshooting and clear communication.
  • Drive thorough root cause analysis (RCA) and conduct blameless postmortems.
  • Translate incidents into systemic fixes.
  • Develop and maintain operational runbooks.
  • Design and maintain a comprehensive observability framework.

Requirements:

  • Minimum of 6+ years of hands-on experience in Site Reliability Engineering (SRE) supporting production-grade, cloud-native enterprise software platforms/applications.
  • Prior experience as a DevOps engineer, cloud system administrator, or software developer.
  • Strong proficiency in scripting languages such as Python and PowerShell.
  • Deep hands-on experience with Microsoft Azure in production environments.
  • Solid understanding of Terraform, Ansible, and Kubernetes internals (networking, scheduling, scaling, resource management).
  • Proven experience in PostgreSQL performance tuning and optimization in production systems.
  • Hands-on experience with Azure Monitor, Application Insights, and Log Analytics.
  • Experience implementing and managing Prometheus and Grafana for Kubernetes and on-prem monitoring.
  • Ability to translate metrics, logs, and traces into actionable insights.
  • Experience troubleshooting and improving CI/CD pipelines.
  • Understanding and application of GitOps principles.

ValGenesis is committed to disrupting the life sciences industry with its digital validation lifecycle management system (VLMS). They are expanding their portfolio beyond validation to an end-to-end digital transformation platform. The company fosters a collaborative, innovative, and customer-centric work environment, aiming to be the number one intelligent validation platform in the market.

This is an onsite position requiring 5 days per week in their Chennai, Hyderabad, or Bangalore offices. ValGenesis is an equal-opportunity employer. They may use AI tools to support the hiring process, but final decisions are made by humans.

Company

V

ValGenesis

ValGenesis is a leading provider of digital validation platform solutions for life sciences companies. Their suite of products is utilized by 30 of the top 50 global pharmaceutical and biotech compani...

Chennai, Tamil Nadu
Posted on Indeed