Yashasvini Recuriter Services Pune
Yashasvini Recuriter Services Pune2h ago
Naukri

Site Reliability Engineer

Remote
Remote
Mid Level
2000000-3000000

Auto Apply to 50+ AI Matched Site Reliability Engineer Jobs

Use Auto Apply Agents to Bulk Apply jobs with ATS Optimised Resumes, find verified Insider Connections for jobs at Yashasvini Recuriter Services Pune

Full Job Description

Role & Responsibilities

We are seeking a skilled Site Reliability Engineer (SRE) with 2-4 years of experience to ensure the utmost reliability, availability, performance, and scalability of our production systems. This pivotal role emphasizes operational excellence, proactive incident management, robust monitoring, and in-depth infrastructure debugging, built upon a strong foundation in IT systems, networking, and Linux environments.

Required Technical Skills

SRE & Reliability

  • Demonstrated strong understanding of SRE principles, including reliability, scalability, and fault tolerance.
  • Proven experience with incident response, effective escalation procedures, and conducting thorough postmortems.
  • Solid knowledge of capacity planning and performance tuning methodologies.

Cloud & Infrastructure

  • Hands-on expertise with Amazon Web Services (AWS), specifically EC2, EKS, VPC, IAM, ALB/NLB, RDS, S3, and CloudWatch.
  • Experience operating Kubernetes in a production setting, managing pods, services, ingress, and autoscaling.
  • Proficiency in containerization using Docker.

Monitoring & Observability

Practical experience with a variety of monitoring and observability tools:

  • Prometheus, Grafana
  • CloudWatch
  • ELK Stack / OpenSearch / Loki
  • SigNoz / Datadog / New Relic
  • Ability to design meaningful alerts that balance low noise with high signal accuracy.

IT & Systems Fundamentals

  • Advanced Linux administration skills, covering processes, memory, disk, CPU, and system limits.
  • Comprehensive understanding of networking fundamentals, including TCP/IP, DNS, HTTP/HTTPS, load balancing, firewalls, and SSL/TLS.
  • Knowledge of storage concepts such as block vs. object storage, IOPS, and latency.
  • Experience troubleshooting complex OS-level and network-level issues.

Automation & Tooling

  • Proficiency in scripting languages like Bash and Python.
  • Experience with Infrastructure as Code using Terraform or CloudFormation.
  • Familiarity with CI/CD pipeline support, including Jenkins, GitHub Actions, and GitLab CI.

Professional Attributes

  • High-energy, positive attitude with a demonstrated ability to learn quickly.
  • Strong analytical and problem-solving skills.
  • Embraces AI-powered development as a significant productivity multiplier.
  • Brilliant communication skills essential for effective distributed team collaboration.
  • A true team player with a proactive mindset and a commitment to long-term success.
  • Excellent time-management skills and a strong ownership mindset.
  • Full Software Development Life Cycle (SDLC) experience, from design and deployment to ongoing maintenance.

What You Will Be Doing?

  • Ensure the high availability, reliability, and optimal performance of production environments.
  • Operate and provide comprehensive support for large-scale systems running on AWS and Kubernetes.
  • Define and meticulously monitor Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets.
  • Build and maintain robust monitoring, alerting, and observability platforms.
  • Manage and respond to production incidents, participate in on-call rotations, and conduct post-incident Root Cause Analysis (RCA).
  • Debug complex infrastructure, OS, network, and application issues.
  • Actively reduce toil through automation and the implementation of standard operating procedures (SOPs).
  • Collaborate closely with engineering teams to enhance system reliability and resilience.
  • Plan and rigorously test disaster recovery (DR) and failover strategies.
  • Maintain up-to-date operational documentation and comprehensive runbooks.

Company

Yashasvini Recuriter Services Pune

Yashasvini Recuriter Services Pune

Remote
Posted on Naukri
Site Reliability Engineer -Remote Night Shift-Immediate Joiner at Yashasvini Recuriter Services Pune | Remote | Apply Now | MindMyJob | MindMyJob - AI Job Search Platform