
Senior Site Reliability Engineer
Responsibilities
Qualifications & Requirements
Experience Level: Senior Level
Full Job Description
About the Role
As a Senior Site Reliability Engineer at Headout, you will be instrumental in owning and operating our cloud-native infrastructure and Kubernetes platforms. These systems are the backbone of our customer-facing services, operating at a significant scale. Your responsibilities will include designing and optimizing CI/CD workflows, enhancing deployment reliability, and driving improvements in observability, incident management, and overall system performance across the organization. You will also focus on building essential platform tooling to boost developer velocity, enforce critical security guardrails, and standardize best practices. This senior position requires a strong sense of ownership, sophisticated architectural thinking, and the ability to mentor junior engineers.
Why This Role is Special
- Full Platform Exposure: Gain comprehensive experience across DevOps, infrastructure management, observability solutions, performance optimization, and reliability engineering.
- Architecture Ownership: Play a key role in influencing platform and tooling decisions by leveraging benchmarks and performance metrics to guide strategy.
- High Impact: Develop and implement systems that significantly reduce deployment turnaround times, improve p99 latency metrics, and scale effectively across multiple teams.
- Flexibility: Enjoy the freedom to work across diverse technology stacks, tools, and evolving platform landscapes.
Required Skills and Experience
- A minimum of 2-5 years of experience in operating customer-facing services at scale.
- Strong hands-on experience with Kubernetes cluster operations and workload optimization techniques.
- Familiarity with service mesh technologies and distributed tracing tools such as Istio and Jaeger.
- Proficiency with at least one major cloud provider; AWS is preferred, but GCP or Azure experience is also acceptable.
- Hands-on experience with monitoring and alerting stacks, including tools like Prometheus, Grafana, Thanos, Datadog, or New Relic.
- Proven track record in designing and implementing robust CI/CD pipelines using tools such as GitHub Actions, GitLab CI, or Jenkins.
- Proficiency in Infrastructure as Code (IaC) principles and tools, specifically Terraform or Pulumi.
- Strong programming skills in Python, Go, or Java/Kotlin, alongside solid shell scripting abilities.
- Experience working with databases like MySQL and MongoDB, including application and query profiling.
- A solid understanding of security best practices and relevant compliance standards.
- A high-ownership mindset, with the proactive ability to identify, diagnose, and resolve complex platform issues.
Company
Headout
Headout is a leading online platform revolutionizing how people discover and book experiences. Connecting travelers with unforgettable adventures worldwide, Headout leverages cutting-edge technology t...