
Site Reliability Engineer ID53670
Full Job Description
AgileEngine seeks a Middle SRE Operations Engineer in Hyderabad / Secunderabad, Telangana, India, to ensure the reliability of our cloud-based SaaS platform. This role involves managing live incidents, enhancing system observability, and automating processes to reduce toil. You will leverage your expertise with Kubernetes, Terraform, Grafana, and AWS. This is a hands-on, execution-focused position offering significant ownership across CI/CD pipelines, GitOps workflows, and on-call rotations.
About the Role:
- Monitor and support production and staging environments to ensure optimal availability, performance, and stability.
- Respond to live incidents, conduct thorough triage and root cause analysis, and actively contribute to remediation efforts.
- Participate in defined on-call rotations, adhering to strict Service Level Agreements (SLAs).
- Address operational requests from internal teams efficiently.
- Maintain and continuously improve monitoring systems, alerting mechanisms, dashboards, logs, and metrics.
- Provide support for CI/CD pipelines, production releases, and GitOps workflows.
- Drive automation initiatives to minimize operational overhead and enhance efficiency.
- Manage and optimize Kubernetes-based infrastructure and containerized workloads.
- Support and enhance Infrastructure as Code (IaC) practices and environment improvements.
Must-Have Qualifications:
- Minimum of 2 years of experience in Site Reliability Engineering, DevOps, or Production Operations.
- Proven experience with AWS in supporting production environments.
- Demonstrated experience supporting production SaaS applications.
- Solid understanding of CI/CD systems such as GitHub Actions, Jenkins, or CircleCI.
- Experience with GitOps principles and Git fundamentals.
- Proficiency in using collaboration tools like GitHub, Jira, and Confluence.
- Hands-on experience with Kubernetes (e.g., EKS, kOps).
- Experience with Docker and containerization technologies.
- Familiarity with observability tools including Grafana, Prometheus, Loki, and PagerDuty.
- Proficiency in scripting languages like Bash, Python, or Go.
- Experience with Infrastructure as Code tools such as Terraform and Helm.
- Ability to operate effectively within structured operational processes and meet SLAs.
- Strong written and verbal communication skills in English.
- A self-driven individual with a growth mindset.
Nice-to-Have Qualifications:
- Relevant AWS certifications (e.g., Solutions Architect, DevOps Engineer, SysOps Administrator).
- Experience in multi-tenant SaaS environments.
- Experience working collaboratively within globally distributed teams.
- Familiarity with ChatOps practices.
- Experience in enhancing monitoring quality and reducing alert fatigue.
Perks and Benefits:
- Remote Work & Local Connection: Enjoy the flexibility of working from your most productive location and connect with your team through periodic local meet-ups to build your network and engage with fellow experts.
- Legal Presence in India: We ensure complete local compliance with a structured and secure work environment tailored to Indian regulations.
- Competitive Compensation in INR: Receive fair compensation in INR, along with dedicated budgets for your personal growth, education, and wellness.
- Innovative Projects: Utilize the latest technologies to create cutting-edge solutions for world-renowned clients and emerging startups.
Company
AgileEngine
AgileEngine is a distinguished Inc. 5000 company renowned for crafting award-winning software solutions for Fortune 500 brands and innovative startups across more than 17 industries. Recognized as a l...