
Site Reliability Engineer_Associate...
Responsibilities
Qualifications & Requirements
Experience Level: Mid Level
Full Job Description
Join Morgan Stanley as a Site Reliability Engineer focusing on Software Production Management and Reliability Engineering in Mumbai, India. In this role, you will be responsible for proactively detecting, troubleshooting, and resolving all issues affecting production applications, collaborating with development and external teams as needed. You will own escalated issues until resolution or a viable workaround is provided, ensuring clear and timely communication with affected parties during outages. Your primary responsibility will be maintaining the stability of the production environment.
You will develop and revise policies and procedures to ensure appropriate application development standards for systems deployed to Production, acting as gatekeepers to enforce Change Implementation Management guidelines. Additionally, you will service requests for data or other activities requiring access to production systems and work with development teams early in the application lifecycle to ensure new systems meet production standards.
A key aspect of this role is maintaining and growing a body of knowledge accessible to all team members, improving self-reliance and reducing dependency on external resources for initial troubleshooting. As an expert in deep analytical triage, you will provide subject matter expertise in debugging, issue analysis, and troubleshooting, offering reviews and recommendations to prevent future application issues. You will also serve as a seasoned technical resource, contributing expertise in outage management and proactive solutions to enhance user experience.
Key Responsibilities:
- Proactive issue detection, troubleshooting, and resolution in production environments.
- Coordination and escalation with development and external teams.
- Clear and concise communication during incident management.
- Ensuring stability and adherence to Change Implementation Management guidelines for the production environment.
- Developing and maintaining policies and procedures for production systems.
- Servicing requests for data and activities requiring production system access.
- Collaborating with development teams to ensure new systems meet production standards.
- Building and maintaining a knowledge base for the team.
- Providing subject matter expertise in debugging, issue analysis, and troubleshooting.
- Contributing to outage management and proactive solutions.
- Participating in an incident "on call" rotation and responding to emergencies on a 24/7 basis.
Qualifications:
- At least 2 years of relevant experience in developing and/or supporting Enterprise Applications.
- Willingness to embrace Agile and DevOps/SRE concepts.
- Solid analytical skills, problem determination, and resolution recovery processes.
- Experience with observability tools such as Prometheus, Grafana, Loki, Kibana, Splunk, etc.
- Ability to interface and cultivate excellent working relationships with technology teams, business analysts, and vendors.
- Administrative competence in at least one major programming language or platform (e.g., Perl, Powershell, Python, or Java).
- Fast learner of technologies in a fast-paced environment.
- Strong organizational skills and ability to manage multiple tasks and high-pressure situations.
- Driven to learn new technologies and techniques.
- Hands-on experience administering large-scale, high-availability systems and monitoring tools.
- BS/MS or equivalent, preferably in a quantitative discipline (Computer Science, Computer Engineering, EE, Math, Physics).
- Experience working with the Financial Services area is a plus.
- Willingness to embrace Agile and DevOps/SRE concepts.
Company
Morgan Stanley
Morgan Stanley is a leading global financial services firm that provides investment banking, securities, investment management, and wealth management services. With a presence in 1,200 offices across ...