
Site Reliability Engineer
Responsibilities
Qualifications & Requirements
Experience Level: Mid Level
Full Job Description
As a Site Reliability Engineer at Turvo, you will play a crucial role in ensuring the health, continuous monitoring, automation, and scalability of our complex, web-scale systems. Your responsibilities will include proactively monitoring the production environment and swiftly responding to emerging trends and issues. You will be instrumental in debugging, troubleshooting the entire stack of our services, and leading the analysis of any system outages. Active participation in bug/issue triage with feature teams will be essential for making informed decisions aligned with business and engineering objectives. You will be tasked with documenting operational processes to support proactive monitoring, debugging, and issue resolution.
Furthermore, you will develop tools that enhance our capability to rapidly deploy and effectively monitor custom applications. Collaboration with development teams will be key to ensuring that our platforms are designed with operability as a core consideration. You will design, write, and deliver high-quality software to improve the availability, reliability, scalability, latency, security, resiliency, and overall efficiency of our services. A significant aspect of your role will involve writing software and building automation to achieve permanent problem resolution. You will also engage in service capacity planning, demand forecasting, software performance analysis, and system tuning.
Qualifications:
- A minimum of 3 years of experience in a UNIX-based large-scale web operations role.
- At least 2 years of experience with a programming language, with a preference for Python.
- Proficiency with relational databases such as MySQL and NoSQL databases like MongoDB or Cassandra.
- Exposure to monitoring tools such as Dynatrace, ELK, or similar is considered an advantage.
- Familiarity with application profiling, system scalability, monitoring, and performance optimization.
- Demonstrated ability to understand unfamiliar codebases and debug server-side, multi-threaded, and highly scalable applications.
- Strong debugging, troubleshooting, and problem-solving skills.
- Previous experience working effectively with geographically distributed colleagues.
Company
Turvo
Turvo is a leading provider of a collaborative Transportation Management System (TMS) application, purpose-built for the supply chain industry. Our innovative Turvo Collaboration Cloud™ connects freig...