
ZF Group•2h ago
Career Pages
Site Reliability Engineer
Chennai, TN, IN, 600116
Full Time
Mid Level
N/A
N/A
N/A
Responsibilities
Qualifications & Requirements
Experience Level: Mid Level
Full Job Description
About the Team
The Garuda team is a Site Reliability Engineering (SRE) group focused on ensuring the reliability and operational excellence of our Fleet Management Services platform. We are dedicated to maintaining the availability and performance of the platform through proactive incident management, strategic optimization, and a commitment to continuous improvement. Additionally, we contribute to the development of SCALAR's internal developer platform, Infinity.
What you can look forward to as a Site Reliability Engineer (SRE)
- Implement and maintain our monitoring systems. Take ownership during system outages and incidents, conduct thorough root cause analysis of system failures, and implement effective fixes. Actively participate in incident reviews.
- Plan, execute, and test software updates. Automate infrastructure management tasks and processes to enhance efficiency.
- Respond to on-call incidents with prompt and effective resolutions.
- Maintain a customer-obsessed approach, applying an engineering mindset to troubleshooting challenges.
- Drive continuous improvement and operational excellence across all aspects of the platform.
Required Skills:
- Proficiency in Linux/Unix/Windows administration.
- Experience with cloud platforms, particularly AWS.
- Skilled in scripting languages such as Python, Shell, and PowerShell.
- Experience with configuration management tools like Puppet, Ansible, and Terraform.
- Familiarity with CI/CD tools including Jenkins and GitLab CI/CD.
- Experience with monitoring tools such as Grafana, Dynatrace, and Icinga.
- Knowledge of database management (SQL Server, MongoDB).
Your profile as a Site Reliability Engineer (SRE):
- 4-7 years of experience as a Site Reliability Engineer in cloud-native production environments, preferably AWS, supporting large-scale, high-availability web-based applications.
- Solid understanding of server operating systems (Linux/Windows).
- Awareness of server deployment, management, and patching tools.
- Experience with monitoring tools like Dynatrace, Nagios (Icinga), Grafana, CloudWatch, or Elasticsearch.
- Foundational knowledge of networking concepts including Routers, Switches, Firewalls, and Load Balancers.
- Working experience in managing and supporting live 24x7 applications.
- Experience managing databases such as SQL Server and MongoDB.
- Proven experience with system automation tools like Puppet, Ansible, and Terraform, as well as CI/CD tooling.
- Experience with at least one scripting language (Python, PowerShell, or Shell).
Why choose ZF Group in India?
- Innovative Environment: Be part of a company at the forefront of technological advancements, fostering creativity and growth.
- Diverse and Inclusive Culture: Experience a workplace where all employees are valued and respected, promoting collaboration and mutual support.
- Career Development: Benefit from extensive training programs, career advancement opportunities, and a clear growth path.
- Global Presence: Work on international projects and collaborate with global teams within a leader in driveline and chassis technology.
- Sustainability Focus: Contribute to eco-friendly solutions and the company's commitment to environmental responsibility.
- Employee Well-being: Enjoy comprehensive health and wellness programs, flexible work arrangements, and a supportive work-life balance.
Join our ZF team as a Site Reliability Engineer and apply today!
Company
ZF Group
Chennai, TN, IN, 600116
Posted on Career Pages