Infrastructure & Application Monitoring
- Monitor the performance of Windows/Linux servers, Cloud instances (GCP/AWS), and network infrastructure using tools like SCOM, SolarWinds, AppDynamics, and New Relic.
- Troubleshoot server issues including disk space, CPU/memory utilization, and application failures.
- Investigate service outages, incidents, and system issues, logging and tracking them in the incident reporting system.
- Understand contemporary security technologies, basic server on-prem hardware, and basic cloud architecture.
- Utilize DNS and common troubleshooting tools (e.g., ping, traceroute, netstat, nslookup).
- Possess a basic understanding of Distributed Computing concepts (load balancers, service clusters, server pools).
- Perform basic Linux and Windows administration tasks (memory, disk, CPU evaluation, application restart, log management, shell scripts), including RHEL.
- Adhere to Standard Operating Procedures (SOPs) and maintain an up-to-date inventory of all nodes, including impact and callout sheets.
- Assist with automation solutions where applicable, leveraging API knowledge and scripting.
- Coordinate with site technicians and Production Engineering Groups for issue resolution and escalation.
