
Site Reliability Engineer_Director_...
Responsibilities
Qualifications & Requirements
Experience Level: Senior Level
Full Job Description
As a Site Reliability Engineer, you will be instrumental in maintaining the stability and performance of production applications at Morgan Stanley. Your responsibilities will include proactively detecting, troubleshooting, and resolving production issues, collaborating closely with development and external teams. You will own escalated issues until resolution or a suitable workaround is provided. Maintaining clear and timely communication during outages and system-wide incidents is crucial. You will also develop and refine policies and procedures for application development standards, ensuring adherence to Change Implementation Management guidelines for all production deployments. This role involves servicing requests requiring access to production systems and working with development teams to ensure new systems meet production standards. A key aspect will be building and maintaining a knowledge base to enhance team self-reliance. Leveraging your expertise in deep analytical triage, debugging, and issue analysis, you will provide subject matter expertise to avoid future application issues. You will be a seasoned technical resource for outage management and proactive solutions to enhance user experience. This position requires a minimum of 4 years of relevant experience and a minimum of 7 years in developing and/or supporting Enterprise Applications. A willingness to embrace Agile and DevOps/SRE concepts is essential. Solid analytical skills, problem determination, and recovery processes are expected. Experience with observability tools such as Prometheus, Grafana, Loki, Kibana, and Splunk is required. You should possess the ability to build excellent working relationships with technology teams, business analysts, and vendors. Administrative competence in at least one major programming language or platform (e.g., Perl, Powershell, Python, Java) is necessary. The ideal candidate is a fast learner, adaptable to a quick-paced environment, and demonstrates strong organizational skills to manage multiple tasks and high-pressure situations. A drive to learn new technologies and contribute significantly to the team is highly valued. Hands-on experience administering large-scale, high-availability systems and their monitoring tools is also required. A BS/MS or equivalent, preferably in a quantitative discipline (Computer Science, Computer Engineering, EE, Math, Physics), is preferred. Experience with incident "on call" rotations and the ability to respond to emergencies on a 24/7 basis is mandatory. Experience in the Financial Services sector is considered a plus.
Company
Morgan Stanley
Morgan Stanley is a global financial services firm renowned for its expertise in investment banking, securities, investment management, and wealth management. With a workforce of over 80,000 employees...