Senior Site Reliability Engineer
Responsibilities
Qualifications & Requirements
Experience Level: Senior Level
Full Job Description
StarRez is seeking a Senior Platform Engineer with a Software Engineering background to join its expanding global team supporting SaaS products. The ideal candidate is an experienced PHP developer with strong skills in DevOps/Infrastructure tooling and expertise in cloud platforms and operations. This role involves leveraging Software Engineering experience to boost system performance and reliability, and developing internal systems and capabilities through automation to eliminate manual tasks. You will be part of a globally distributed Platforms team operating in a "follow the sun" model on a multi-region cloud platform.
Key Responsibilities:
- Provide technical leadership and mentorship through knowledge sharing, pair programming, code reviews, and solution design.
- Identify and implement solutions to enhance platform reliability, including creating mitigation strategies and operational playbooks.
- Implement and maintain monitoring, alerting, and logging systems for incident identification and response.
- Conduct and participate in Root Cause Analyses (RCAs) and blameless post-mortems.
- Participate in on-call rotations to ensure system reliability and rapid incident response.
- Ensure the scalability and efficiency of cloud infrastructure and systems to accommodate traffic and data growth.
- Conduct performance tests to identify and resolve bottlenecks.
- Develop and maintain platform solutions, automating infrastructure provisioning, configuration, and management using Infrastructure as Code.
- Monitor, review, and tune databases for high availability and performance.
- Collaborate with product engineering teams to design and build observable, fit-for-purpose software.
- Contribute to defining Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs).
Required Qualifications:
- Bachelor's degree in Computer Science, Information Technology, or a related field.
- 3+ years of experience in Platform Engineering, Site Reliability Engineering, or Software Engineering.
- 1+ year of experience working on a SaaS platform.
- Recent and proficient experience in PHP development.
- Production experience with containerization technologies like Kubernetes.
- Proficiency with public cloud providers such as Azure, AWS, or GCP.
- Proficiency with Infrastructure as Code (IaC) tools like Terraform (preferred), Ansible, or CloudFormation.
- Proficiency in scripting and automation using languages like Bash, PowerShell, or Python.
- Experience with monitoring, observability, and logging tools such as DataDog, Prometheus, Grafana, or similar.
- Proven track record of maintaining highly-available and performant production environments.
- Ability to identify and implement effective mitigation strategies and operational playbooks.
Preferred Qualifications:
- Experience with CI/CD tooling such as Azure DevOps/GitHub Actions, Octopus Deploy.
- Relevant cloud platform certifications (e.g., Microsoft Certified: Azure Solutions Architect) and DevOps certifications (e.g., Certified Kubernetes Administrator) are a plus.
- Experience in database management and performance tuning, particularly with MSSQL.
StarRez fosters a culture of belonging, building, and growth, with a people-first philosophy and a long-term vision for employee development. Backed by Vista Equity Partners, StarRez offers a blend of agility and stability, driven by a "Z-Factor" culture of passion, care, and high performance. We are an equal opportunity employer committed to diversity and inclusion.
Company
StarRez
StarRez is a global leader in student housing software, offering innovative solutions for managing on and off-campus housing, enhancing resident well-being and experience, and driving revenue generati...