
Site Reliability Engineer ID53670
Full Job Description
AgileEngine is seeking a skilled Middle SRE Operations Engineer to join our team in Bengaluru/Bangalore, India. This role is crucial for maintaining the reliability and performance of our cloud-based SaaS platform. You will be responsible for managing live incidents, enhancing system observability, and driving efficiency through automation. This hands-on position offers significant ownership and exposure to CI/CD pipelines, GitOps workflows, and on-call responsibilities, utilizing technologies like Kubernetes, Terraform, Grafana, and AWS.
Key Responsibilities:
- Monitor and support production and staging environments to ensure high availability, optimal performance, and system stability.
- Respond effectively to incidents, conducting thorough triage and root cause analysis, and contributing to remediation efforts.
- Participate actively in on-call rotations, adhering to defined service level agreements (SLAs).
- Manage and fulfill operational requests from internal development teams.
- Maintain and enhance monitoring systems, alerting mechanisms, dashboards, logging, and metrics.
- Provide support for CI/CD pipelines, production releases, and GitOps workflows.
- Drive automation initiatives to reduce operational overhead and improve team efficiency.
- Administer and optimize Kubernetes-based infrastructure and containerized applications.
- Support and advance Infrastructure as Code (IaC) practices and environment improvements.
What You Bring:
- Minimum 2 years of experience in Site Reliability Engineering, DevOps, or Production Operations.
- Proven experience with AWS in supporting production environments.
- Experience supporting production SaaS applications.
- Solid understanding of CI/CD systems such as GitHub Actions, Jenkins, or CircleCI.
- Proficiency with GitOps and Git fundamentals.
- Experience utilizing GitHub, Jira, and Confluence for workflow management.
- Hands-on experience with Kubernetes (e.g., EKS, kOps).
- Experience with Docker and containerization technologies.
- Familiarity with observability tools like Grafana, Prometheus, Loki, and PagerDuty.
- Proficiency in scripting languages such as Bash, Python, or Go.
- Experience with Infrastructure as Code tools including Terraform and Helm.
- Ability to operate within structured operational processes and meet SLAs.
- Strong written and verbal English communication skills.
- A self-driven attitude with a commitment to continuous growth.
Bonus Points:
- AWS certifications (Solutions Architect, DevOps Engineer, SysOps Administrator).
- Experience in multi-tenant SaaS environments.
- Experience working with globally distributed teams.
- Familiarity with ChatOps practices.
- Experience in enhancing monitoring quality and reducing alert fatigue.
Why Join Us:
- Remote Flexibility with Local Connections: Work remotely while staying connected through periodic in-person meet-ups to build your network and collaborate with fellow experts.
- Local Compliance: We ensure full compliance with Indian regulations, providing a secure and structured work environment.
- Competitive Compensation: Receive fair compensation in INR, with dedicated budgets for your professional development, education, and wellness.
- Impactful Projects: Engage with cutting-edge technologies and contribute to innovative solutions for globally recognized clients and emerging startups.
This is a permanent position offering a fantastic opportunity for professional growth in a supportive and forward-thinking company.
Company
AgileEngine
AgileEngine is a recognized leader in software development, consistently ranked among the Inc. 5000 fastest-growing private companies. We specialize in creating award-winning software solutions for a ...