
Site Reliability Engineer
Full Job Description
About the Role
This posting is to advertise potential job opportunities at Cisco Home. While this exact role may not be open today, it could become available in the near future. By applying, you express your interest, and a Cisco representative may contact you directly if a relevant position opens.
Join the Intersight Team as a Site Reliability Engineer, where you will be instrumental in ensuring the reliability, scalability, and security of our cutting-edge cloud platforms. You'll be part of a dynamic team of experienced engineers who champion innovation and accountability. In this role, you will represent the Intersight SRE team, tackling complex challenges with creativity and driving the team's technical roadmap. You will collaborate closely with software development, product management, customers, and security teams to design, influence, build, and maintain multi-region SaaS systems. Your contributions will be vital to the success of our initiatives by ensuring robust, efficient platform infrastructure aligned with operational excellence.
We are looking for a highly motivated SRE to join a high-performing team focused on the reliability and scalability of cloud services, with a special emphasis on a rapidly expanding next-generation project. This position requires close collaboration with product engineering, service engineering, and other SRE teams in a high-trust, well-coordinated environment. If you thrive in a fast-paced setting, enjoy creating innovative solutions, and aim to make a significant, long-lasting impact, this role is for you.
Key Responsibilities
- Build, deploy, and optimize cloud and data infrastructure to guarantee high availability, reliability, and scalability of services, meeting customer demands.
- Leverage strong programming skills to integrate software and systems engineering principles, developing core data platform capabilities and automation aligned with customer needs and roadmap objectives.
- Monitor production systems, participate in on-call rotations, troubleshoot incidents, and actively contribute to root cause analysis.
- Drive continuous improvement through postmortem reviews and proactive performance optimization initiatives.
- Collaborate effectively with cross-functional teams, including development, product management, and security, to architect secure, scalable solutions and enhance operational efficiency through automation.
Minimum Qualifications
- A Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent practical experience).
- A minimum of 2 years of experience in Site Reliability Engineering/DevOps, Cloud Operations, or a comparable role.
- Practical experience with Kubernetes (EKS or self-managed), Docker, and cloud platforms (preferably AWS).
- Proficiency in at least one programming or scripting language (e.g., Python, Bash, Go) for automation and operational tasks.
- Hands-on experience with infrastructure as code tools such as Ansible or Terraform.
- A foundational understanding of Linux systems, CI/CD pipelines, and familiarity with observability tools (e.g., ELK, Splunk, Prometheus).
Preferred Qualifications
- Experience in building/managing cloud-based data platforms, automating infrastructure, and maintaining high availability and system reliability at scale.
- Experience applying AI-assisted or automation-first methodologies to SRE tooling and workflows.
- A strong personal drive for continuous learning, research, and integrating new technologies with significant customer impact.
- Excellent collaboration and communication skills.
- Relevant certifications such as CKA, CKAD, AWS Certified DevOps Engineer, or equivalent are advantageous.
Company
Cisco
Cisco is a global leader in networking and cybersecurity, dedicated to revolutionizing how organizations connect, protect, and transform their operations in the AI era. With a legacy of innovation spa...