
Clearwater Analytics•5h ago
Foundit
Site Reliability Engineer
Noida, India
Full Time
Mid Level
Full Job Description
About the Role
We are seeking a skilled Site Reliability Engineer to join our dynamic team in Noida, India. In this permanent role, you will play a crucial part in building and maintaining highly available and scalable infrastructure, ensuring the reliability and performance of our critical services.
Key Responsibilities
- Construct and manage robust observability stacks utilizing Prometheus and Grafana, with a focus on defining and monitoring Service Level Objectives (SLOs), Service Level Indicators (SLIs), Service Level Agreements (SLAs), and error budgets.
- Take ownership of incident response, including participation in on-call rotations, efficient triage, effective mitigation strategies, and conducting blameless post-mortems to foster continuous improvement.
- Automate repetitive operational tasks and proactively eliminate toil through the development of scripts and tooling using Python, Bash, and Go.
- Design, deploy, and maintain highly available infrastructure on Amazon Web Services (AWS) leveraging Terraform and Ansible for comprehensive infrastructure-as-code workflows.
- Manage and optimize Kubernetes clusters, specifically EKS, and containerized workloads using Docker to robustly support our microservices architecture.
- Actively collaborate with engineering teams during design reviews to embed critical reliability and scalability requirements into new features and systems.
- Monitor capacity and performance trends, proactively identifying and resolving potential bottlenecks to ensure optimal system performance.
- Maintain and continuously improve our CI/CD pipelines and deployment automation processes.
Qualifications Required
- A minimum of 2 to 8 years of professional experience in Site Reliability Engineering, DevOps, or a closely related discipline.
- Working knowledge of essential monitoring and logging tools such as Prometheus, Grafana, Dynatrace or Datadog, OpenSearch, and Victoria Metrics.
- Proven ability to track and monitor SLAs for all critical services.
- Solid experience with Linux systems administration.
- Hands-on experience deploying and managing Kubernetes and Docker in production environments.
- Proficiency with core AWS services including EC2, EKS, RDS, S3, VPC, IAM, and CloudWatch.
- Experience utilizing Infrastructure-as-Code tools like Terraform or Ansible.
- Strong scripting capabilities in Python or Bash.
- Familiarity with CI/CD tools such as GitHub Actions, Jenkins, or GitLab CI.
- Familiarity with GitOps workflows and tools like ArgoCD or Rancher.
Preferred Qualifications
- Prior experience in financial services, FinTech, or other regulated industries is highly advantageous.
- Knowledge of service mesh technologies such as Istio or Linkerd.
- Familiarity with distributed tracing tools like Jaeger or OpenTelemetry.
- Relevant AWS certifications (Solutions Architect, DevOps Engineer, or equivalent).
- Experience implementing and managing cost optimization strategies within cloud environments.
Company
Clearwater Analytics
Clearwater Analytics is a leading provider of comprehensive cloud-based investment accounting solutions.
Noida, India
Posted on Foundit