
SRE & DevOps Engineer
Responsibilities
Qualifications & Requirements
Experience Level: Senior Level
Full Job Description
METRO is seeking an experienced SRE & DevOps Engineer with profound expertise in cloud infrastructure, automation, and observability. The ideal candidate is a hands-on engineer focused on ensuring system reliability, performance, and scalability. We are looking for a proactive problem solver committed to operational excellence and continuous improvement. This role involves bridging development and operations through modern DevOps and SRE practices, requiring effective communication and collaboration within cross-functional teams to drive best practices.
This Senior SRE & DevOps Engineer position is crucial for maintaining the resilience, scalability, and reliability of our systems. By applying contemporary SRE principles, leveraging automation, and implementing robust incident management practices, you will contribute to faster, more dependable delivery of business value while safeguarding system stability and customer trust.
Key Responsibilities include designing, implementing, and maintaining scalable, secure, and cloud-native infrastructure. You will be responsible for setting up and maintaining observability solutions such as monitoring, alerting, logging, and tracing (e.g., Prometheus, Grafana, ELK, DataDog). Continuous improvement of CI/CD pipelines and automation of deployment workflows to enhance delivery efficiency is also a core part of this role. The engineer will lead structured incident response, conduct root cause analysis, and foster a culture of learning from post-mortems. Collaboration with development, QA, and architectural teams to ensure seamless integration and performance optimization is essential. Applying SRE principles like SLIs, SLOs, SLAs, and error budgets to guide operational decisions and enhance system reliability is expected. Championing Infrastructure-as-Code practices using tools like Terraform, Helm, or Ansible, and ensuring security, compliance, and reliability are integrated into operations are key duties. Mentoring team members and promoting a culture of operational excellence and continuous improvement are also important aspects of this role.
Qualifications include a Bachelor’s or Master’s degree in Computer Science, Engineering, or equivalent practical experience. We require proven experience of 6 to 8 years in Site Reliability Engineering, DevOps, or Cloud Engineering roles. Hands-on expertise with Kubernetes (preferably GKE), Docker, and service mesh technologies like Istio is essential. A strong background in CI/CD practices and tools such as GitHub Actions, Jenkins X, ArgoCD, or similar is necessary. Experience with observability solutions including Prometheus, Grafana, ELK, Jaeger, DataDog, and GCP Dashboards is required. Proficiency with at least one major cloud platform (GCP, AWS, Azure) and scripting or programming experience in languages like Python, Go, or Bash is vital. Practical knowledge of Infrastructure-as-Code tools such as Terraform, Helm, or Ansible, along with hands-on experience in incident management, troubleshooting, and root cause analysis, is expected. Familiarity with SRE practices like SLIs, SLOs, SLAs, and error budgets is also required.
Additional requirements include strong communication and collaboration skills across cross-functional teams, the ability to balance short-term operational needs with long-term scalability and system health, and an analytical, proactive mindset focused on continuous improvement. Fluency in English, both written and spoken, is mandatory.
Nice-to-have qualifications include experience with security best practices in distributed systems (OAuth2, mTLS, RBAC), knowledge of cost optimization and cloud governance practices, familiarity with Camunda/CIB7 environments, and contributions to open-source DevOps/SRE communities.
Company
METRO
Metro Global Solution Center (MGSC) is an internal solution partner for METRO, a prominent international wholesaler with a significant presence across more than 30 countries. METRO operates a vast sto...