Site Reliability Engineer - 3
Join Uni Cards as a Site Reliability Engineer and be instrumental in building a scalable platform from scratch. You will be responsible for Dockerizing and orchestrating with Kubernetes (K8s), working extensively with technologies like Elasticsearch, MongoDB, Snowflake, and Kafka clusters. Your role will involve implementing industry best practices, challenging the status quo, and staying abreast of technical trends to ensure our work is best-in-class. Key responsibilities include managing capacity, embedding security at every layer, and optimizing costs. You will implement robust monitoring at scale using tools like Prometheus, and maintain live services by measuring and monitoring availability, latency, and overall system reliability. Secure networking, key management, user and access management, and image management are also critical aspects of this role.
What you will need:
- Experience with the AWS platform and Kubernetes (K8s).
- Strong fundamentals in Linux and networking.
- Proficiency in Python or shell scripting languages.
- A mindset geared towards automating everything.
- Awareness of cloud security concepts and best practices.
- Experience with CI/CD practices, deployment patterns, and relevant toolsets.
- Familiarity with observability practices and toolchains (Monitoring, Metrics, Logging, Alerts & Tracing).
- Experience with Infrastructure as Code tools like Terraform and Ansible.
Good to have:
- AWS Certified Solutions Architect certification.
- Certification in Kubernetes Administrator (CKA).
- Certification in Kubernetes Application Developer (CKAD).
- Experience with configuration management tools.
- Strong code analysis skills in Python.
