PhonePe
PhonePe2h ago
InstaHyre

Site Reliability Engineer 3

Bangalore
Full Time
Senior Level

Auto Apply to 50+ AI Matched Site Reliability Engineer 3 Jobs

Use Auto Apply Agents to Bulk Apply jobs with ATS Optimised Resumes, find verified Insider Connections for jobs at PhonePe

Responsibilities

Qualifications & Requirements

Experience Level: Senior Level

Full Job Description

Site Reliability Engineer 3 - Big Data

Responsibilities:

  • Oversee and maintain Linux/Unix environments, managing incremental changes.
  • Lead on-call rotations and incident response, including root cause analysis and postmortem processes.
  • Design and implement automation for big data infrastructure, covering provisioning, scaling, upgrades, and patching.
  • Resolve complex production issues, identify root causes, and implement mitigating strategies.
  • Architect and review scalable and reliable system designs.
  • Collaborate with teams to optimize overall system performance.
  • Enforce security standards across systems and infrastructure.
  • Set technical direction, drive standardization, and operate with autonomy.
  • Ensure system and service availability, performance, and scalability through proactive monitoring, maintenance, and capacity planning.
  • Analyze and respond to system outages, implementing measures to prevent recurrence.
  • Develop tools and scripts to automate operational tasks, enhancing efficiency and resilience.
  • Monitor and optimize system performance and resource utilization, addressing bottlenecks and implementing best practices.
  • Partner with development teams to embed reliability, scalability, and performance best practices in the SDLC.
  • Stay abreast of industry technology trends and contribute to internal technology communities.
  • Develop and enforce SRE best practices and principles.
  • Align cross-functional teams on priorities and deliverables.
  • Drive automation initiatives to boost operational efficiency.

Requirements:

  • 7+ years of experience managing distributed big data ecosystems.
  • Strong Linux expertise, including IP, Iptables, and IPsec.
  • Proficiency in scripting/programming languages such as Perl, Golang, or Python.
  • Hands-on experience with the Hadoop stack: HDFS, HBase, Airflow, YARN, Ranger, Kafka, Pinot.
  • Familiarity with open-source configuration management and deployment tools (Puppet, Salt, Chef, Ansible).
  • Solid understanding of networking, open-source technologies, and related tools.
  • Excellent communication and collaboration skills.
  • Experience with DevOps tools: SaltStack, Ansible, Docker, Git.
  • Experience with SRE Logging and monitoring tools: ELK stack, Grafana, Prometheus, opentsdb, Open Telemetry.
  • Experience managing infrastructure on public cloud platforms (AWS, Azure, GCP).
  • Experience designing and reviewing system architectures for scalability and reliability.
  • Experience with observability tools for visualizing and alerting on system performance.

Company

PhonePe

PhonePe

PhonePe: Revolutionizing Digital Payments in IndiaPhonePe is dedicated to making digital payments effortless, secure, and universally accessible, aiming to eliminate the need for physical cash and car...

Bangalore
Posted on InstaHyre
Site Reliability Engineer 3 - Big Data at PhonePe | Bangalore | Apply Now | MindMyJob | MindMyJob - AI Job Search Platform