Site Reliability Engineer
Full Job Description
Join Apple in Hyderabad, Telangana, and contribute to groundbreaking technology. As a Site Reliability Engineer within the Emerging Technology Services team, you will be instrumental in productizing and evangelizing upcoming software. This role is a fusion of SRE, DevOps, and Operational Intelligence, focused on building technology pillars that support Apple's vast ecosystem, from e-commerce to reporting systems, handling massive scale and speed. You will work on platforms like Load Balancers, perimeter security, and API gateways, driving their adoption and ensuring seamless integration across disparate systems.
This position is within a horizontal Platform Engineering Ops group, tackling highly scalable distributed applications. It demands strategic engineering and data science expertise, combined with hands-on technical execution. You will delve into the security domain, build machine learning pipelines for phish/fraud/anomaly detection, explore crypto strategies for privacy, and apply data science skills to petabytes of data. Your contributions will directly impact the software that delights billions of Apple customers daily.
Key responsibilities include:
- Reviewing and optimizing hardware, software infrastructure, and application functionality for performance bottlenecks.
- Developing and maintaining application services, and leading incident management.
- Designing and implementing comprehensive monitoring for applications, integrations, and anomalies.
- Implementing and rolling out high-performance, large-scale security platforms.
- Onboarding and maintaining expansive data pipelines for various security platforms.
- Analyzing and troubleshooting security detections to reduce false positives and enhance detection accuracy.
- Collaborating with cross-functional IT, business groups, production support, application engineers, systems engineers, database administrators, and QA teams to ensure platform and application reliability.
Preferred qualifications include:
- Strong analytical skills.
- Familiarity with Java and JVM technologies, runtime configurations, and troubleshooting is a plus.
- Experience with modern web services architectures, cloud platforms (AWS, GCP, Azure), and distributed storage systems (ScaleIO, Amazon S3).
- Experience with monitoring and logging tools like Prometheus, Splunk, Grafana, and Cloudwatch is a plus.
- Understanding of CI/CD, Release Engineering, and DevOps principles.
- A good understanding of various machine learning algorithms and patterns is desired.
- Knowledge of cryptographic algorithms.
- In-depth experience in writing, understanding, and reverse-engineering regular expressions for pattern detection.
- Strong understanding of TLS, mTLS, and industry standards for secure communication.
- Skilled in researching vulnerabilities and threats, and translating them into system designs for detection and prevention.
Minimum qualifications:
- 10+ years of experience in software engineering.
- Hands-on experience in at least one object-oriented language, preferably Java/JEE.
- Hands-on experience with automation tools such as Ansible and Terraform.
- Strong programming and scripting fundamentals (Python/Bash/LUA).
- Strong relational and non-relational database fundamentals with hands-on PL/SQL experience.