Clearwater Analytics
Clearwater Analytics5h ago
Foundit

Site Reliability Engineer

Noida, India
Full Time
Mid Level

Auto Apply to 50+ AI Matched Site Reliability Engineer Jobs

Use Auto Apply Agents to Bulk Apply jobs with ATS Optimised Resumes, find verified Insider Connections for jobs at Clearwater Analytics

Full Job Description

About the Role

We are seeking a skilled Site Reliability Engineer to join our dynamic team in Noida, India. In this permanent role, you will play a crucial part in building and maintaining highly available and scalable infrastructure, ensuring the reliability and performance of our critical services.

Key Responsibilities

  • Construct and manage robust observability stacks utilizing Prometheus and Grafana, with a focus on defining and monitoring Service Level Objectives (SLOs), Service Level Indicators (SLIs), Service Level Agreements (SLAs), and error budgets.
  • Take ownership of incident response, including participation in on-call rotations, efficient triage, effective mitigation strategies, and conducting blameless post-mortems to foster continuous improvement.
  • Automate repetitive operational tasks and proactively eliminate toil through the development of scripts and tooling using Python, Bash, and Go.
  • Design, deploy, and maintain highly available infrastructure on Amazon Web Services (AWS) leveraging Terraform and Ansible for comprehensive infrastructure-as-code workflows.
  • Manage and optimize Kubernetes clusters, specifically EKS, and containerized workloads using Docker to robustly support our microservices architecture.
  • Actively collaborate with engineering teams during design reviews to embed critical reliability and scalability requirements into new features and systems.
  • Monitor capacity and performance trends, proactively identifying and resolving potential bottlenecks to ensure optimal system performance.
  • Maintain and continuously improve our CI/CD pipelines and deployment automation processes.

Qualifications Required

  • A minimum of 2 to 8 years of professional experience in Site Reliability Engineering, DevOps, or a closely related discipline.
  • Working knowledge of essential monitoring and logging tools such as Prometheus, Grafana, Dynatrace or Datadog, OpenSearch, and Victoria Metrics.
  • Proven ability to track and monitor SLAs for all critical services.
  • Solid experience with Linux systems administration.
  • Hands-on experience deploying and managing Kubernetes and Docker in production environments.
  • Proficiency with core AWS services including EC2, EKS, RDS, S3, VPC, IAM, and CloudWatch.
  • Experience utilizing Infrastructure-as-Code tools like Terraform or Ansible.
  • Strong scripting capabilities in Python or Bash.
  • Familiarity with CI/CD tools such as GitHub Actions, Jenkins, or GitLab CI.
  • Familiarity with GitOps workflows and tools like ArgoCD or Rancher.

Preferred Qualifications

  • Prior experience in financial services, FinTech, or other regulated industries is highly advantageous.
  • Knowledge of service mesh technologies such as Istio or Linkerd.
  • Familiarity with distributed tracing tools like Jaeger or OpenTelemetry.
  • Relevant AWS certifications (Solutions Architect, DevOps Engineer, or equivalent).
  • Experience implementing and managing cost optimization strategies within cloud environments.

Company

Clearwater Analytics

Clearwater Analytics

Clearwater Analytics is a leading provider of comprehensive cloud-based investment accounting solutions.

Noida, India
Posted on Foundit