
Site Reliability Engineer
Responsibilities
Qualifications & Requirements
Experience Level: Senior Level
Full Job Description
Arista Networks is seeking a Site Reliability Engineer (SRE) to join the CloudVision team in Bengaluru, India. As an SRE, you will combine strong software and systems engineering expertise with a passion for operating production systems at scale, contributing to our global service fleet.
Your responsibilities will include building, safely deploying, and operating critical production systems with a focus on scalability, reliability, observability, performance, and security. You will monitor, support, and enhance product deployment experiences, and develop automation to eliminate toil and improve operational efficiency. Proactive monitoring, incident response, and the creation of postmortem documents to prevent recurrence are key aspects of this role.
The CloudVision platform is deployed on Kubernetes across global regions, utilizing Spinnaker for CI/CD. The tech stack includes GKE, HBase/Hadoop for distributed data storage, ElasticSearch for search, ClickHouse for real-time queries, a Kafka-based stream processing layer for analytics, and TensorFlow for ML analysis. The monitoring infrastructure is built using Prometheus, Grafana, Loki, and other open-source tools.
Key activities involve building and deploying new systems with scalability and reliability as primary requirements, triaging and resolving platform/infrastructural issues, and engaging with third-party vendor support. You will collaborate with Arista's product development teams to identify and resolve infrastructure bottlenecks, and survey and adopt best practices for maintaining secure, scalable, and fault-tolerant systems.
Qualifications:
- Bachelor's or Master's degree in Computer Science or Engineering, or equivalent work experience (5+ years).
- Proficiency in Go, Python, or bash shell scripting for automation.
- Strong knowledge of Linux/UNIX administration and debugging.
- Hands-on experience operating large-scale software systems and infrastructure.
- Experience in server provisioning, particularly from storage and networking perspectives.
- Excellent problem-solving and software troubleshooting skills.
- Experience with infrastructure-as-code.
- Desirable skills: Experience with PostgreSQL or equivalent RDBMS, Docker and virtualization, monitoring stacks (Prometheus, Grafana), Artifactory/docker registry management, CI/CD systems (GitLab, Spinnaker), infrastructure-as-code frameworks (Terraform), and Kubernetes orchestration.
Arista is an engineering-centric company where leadership values sound software engineering principles. Engineers have project ownership and opportunities to work across various domains. The company prioritizes test automation and offers a flat management structure.
Company
Arista
Arista Networks is a global leader in data-driven, client-to-cloud networking solutions, serving large data center, campus, and routing environments. As a well-established and profitable company with ...