MSI Americas, a multinational corporation operating in 14 countries across the Americas, is seeking a highly skilled Cloud Infrastructure Support Engineer (Tier 3/Tier 4) to join our Infrastructure Customer Engineering and Support team. This role is within the Red Hat Telco Cloud Organization and offers a unique opportunity to contribute to either the Tier 3 (L3S) or Tier 4 (L4S) Engineering teams, working collaboratively to ensure the high performance, availability, and reliability of our cloud-based services and infrastructure. As a critical technical subject matter expert, you will apply strong analytical skills to rapidly diagnose and resolve complex issues across the entire cloud stack.
The Teams
Tier 3 (L3S) - Cloud Infrastructure Engineer
This team manages, troubleshoots, and optimizes containerized applications and infrastructure on Kubernetes, RedHat OpenShift, and OpenStack platforms. Tier 3 also supports Nokia Container Services (NCS) and CloudBand Infrastructure Software (CBIS) products. You will serve as a Subject Matter Expert (SME) for core cloud infrastructure technologies, lead the resolution of complex, high-severity customer issues, and provide end-to-end Escalation, Monitoring, and Emergency (EME) support, acting as the final escalation point to guarantee service availability and meet SLAs.
Tier 4 (L4S) - Cloud Infrastructure SRE/Engineer
This specialized engineering task force is dedicated to preventing and solving the most critical and strategic customer issues. Tier 4 Engineers conduct deep-dive troubleshooting, examining issues from high-level Kubernetes errors down to kernel bugs. You will be deeply involved with technologies such as Nokia Container Services (NCS), CloudBand Infrastructure Software (CBIS), and private clouds based on Kubernetes and OpenStack, collaborating closely with developers and product engineers to bridge the gap between infrastructure and software.
Main Responsibilities (for both teams)
- Manage, troubleshoot, and optimize containerized applications and infrastructure deployed on platforms like Kubernetes, RedHat OpenShift, and OpenStack.
- Lead the investigation and resolution of complex, high-severity customer incidents.
- Prepare and conduct rigorous Root Cause Analysis (RCA).
- Develop, test, and maintain automation scripts using Python and Ansible.
- Provide immediate support for urgent cases as part of an on-call rotation.
Required Skills and Experience
Core Technical Expertise
- Linux Expertise: Strong knowledge and proven hands-on experience with advanced Linux (CentOS) system administration. Familiarity with Red Hat and CentOS is highly valued.
- Networking Foundations: Strong knowledge of core networking principles (TCP/IP, routing, load balancing, firewalls) in a cloud environment. A solid grasp of computer networking fundamentals, such as understanding of VLANs and IP routing, is a must.
- Containerization & Virtualization: Strong knowledge of Kubernetes orchestration, OpenStack platforms, and Docker/Containerization. Knowledge in areas like Podman, Kubernetes, Helm, and/or OpenStack, KVM/QEMU is a significant advantage.
- Scripting and Automation: Solid Python scripting skills for task automation and system management. Proficiency in scripting with Bash and Python, or the willingness to learn and adapt, as well as familiarity with Ansible is required.
- Root Cause Analysis (RCA): Expertise in preparation and implementation of RCAs.
- Escalation and Monitoring: Proven experience with EME (Escalation, Monitoring, and Emergency) management processes.
Beneficial Expertise (Added Advantage)
- Networking Advanced Tools: Familiarity with advanced tools and technologies such as Calico, Multus, and Open vSwitch.
- Storage Systems: Proficiency with storage solutions such as CEPH and Rook.
- Database Expertise: Understanding of relational databases such as MySQL and MariaDB, as well as experience with ETCD.
- Certifications: One or more certifications from the list below will be considered an added advantage: Red Hat Certified Specialist in Cloud Infrastructure (EX210), Red Hat Certified Engineer (RHCE) in Red Hat OpenStack (EX310), RHCSA, RHCE, CKA, EX280 (RedHat Certified Specialist in OpenShift Administration), EX380 (RedHat Certified Specialist in OpenShift Automation and API Management).