
Site Reliability Engineer
Full Job Description
NICE is seeking a skilled Site Reliability Engineer to support large, complex enterprise software clients, including applications, servers, SQL, and networks. The ideal candidate possesses excellent problem-solving abilities and will contribute to delivering real-time insights from massive-scale data. We encourage candidates with innovative ideas, unique perspectives, and a collaborative spirit to join our cross-functional team in developing impactful solutions and positive user experiences.
Key Responsibilities:
- Oversee the production environment, focusing on availability monitoring and holistic system health.
- Develop software and systems for managing platform infrastructure and applications.
- Enhance the reliability, quality, and time-to-market of our software solutions.
- Measure and optimize system performance to advance capabilities, anticipate customer needs, and drive continuous improvement through innovation.
- Provide primary operational support and engineering for multiple large-scale distributed software applications.
- Analyze metrics from operating systems and applications to aid in performance tuning and troubleshooting.
- Collaborate with development teams to refine services via rigorous testing and release processes.
- Engage in system design consulting, platform management, and capacity planning.
- Build sustainable systems and services through automation and enhancements.
- Balance the speed of feature development with reliability, adhering to well-defined service level objectives.
Qualifications:
- 2+ years of programming/scripting experience in Go, Python, .Net (C#), or Node.
- Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
- 2-3 years of relevant experience in systems engineering, automation, and reliability.
- Proficiency in at least one programming language (e.g., Python, Go, Java, C#) and scripting languages (e.g., Bash, PowerShell).
- Deep understanding of cloud computing platforms (e.g., AWS) and the operational constraints of key services (e.g., EC2, ECS, Lambda, DynamoDB).
- Experience with infrastructure as code tools like CloudFormation or Terraform.
- Solid understanding of CI/CD concepts and familiarity with tools such as Jenkins, GitLab CI/CD, or CircleCI.
- Strong knowledge of containerization technologies (e.g., Docker, Kubernetes) and microservices architecture.
- Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack, Cloudwatch).
- Exceptional problem-solving skills for troubleshooting complex distributed systems.
- Experience with incident management, conducting blameless postmortems, leading incident response, and cross-functional communication during critical events.
- Availability to work Graveyard Shifts.
Advantageous Skills:
- Kubernetes certification, Grafana, AWS, Azure, DevOps experience.
About the Role and Work Environment:
Join NICE, a dynamic, market-disrupting global company. Our teams, comprised of top talent, operate in a fast-paced, collaborative, and creative environment. As a market leader, NICE offers daily opportunities for learning and growth, with extensive internal career advancement paths across various roles, disciplines, domains, and locations. If you are passionate, innovative, and driven to excel, you might be our next NICEr!
We operate under the NICE-FLEX hybrid work model, offering maximum flexibility with 3 days of remote work and 2 days in the office per week. Office days are dedicated to face-to-face interactions, fostering teamwork, collaborative thinking, innovation, and a vibrant atmosphere.
Requisition ID: 9566
Reporting into: Tech Manager
Role Type: Individual Contributor