Azure CloudOps Engineer
Full Job Description
About Us
We are fundamentally reinventing vertical SaaS by embedding Artificial Intelligence directly into production workflows. Embrace Technology Group unifies engineering efforts across our portfolio to ship AI-native products that already run the economy for hundreds of large enterprises and financial institutions.
The Role
As a CloudOps Engineer, you will operate and continuously improve the reliability, security, scalability, observability, and cost efficiency of our Azure-hosted SaaS platform. You will partner with engineering teams to ensure consistent deployment, effective monitoring, robust security, and reliable production operations across dev, QA, staging, and production environments.
Environment & Technology
- Cloud Platform: Microsoft Azure (Static Web Apps, Container Apps, PostgreSQL, Storage Accounts, SignalR, Service Bus, Azure AI Foundry).
- IaC: Terraform for environment provisioning and state management.
- CI/CD: GitHub Actions for automated pipelines across multiple products.
- AI Focus: Real-world operations on LLM integrations, Speech-to-Text workloads, and Azure AI Foundry services.
Key Responsibilities
You will own the end-to-end operational lifecycle of our infrastructure. This includes managing cloud resources via Terraform, building secure CI/CD workflows in GitHub Actions with strict approval gates, implementing comprehensive observability using Azure Monitor and Log Analytics, and ensuring high availability for critical AI services.
Crucially, you will focus on AIOps: monitoring token consumption, managing quotas, handling throttling events from providers like OpenAI or Google Vertex, and optimizing costs associated with generative models. You will also drive FinOps initiatives by tagging resources, analyzing spend anomalies, and implementing lifecycle policies to right-size infrastructure.
Security & Compliance
You are expected to implement cloud security best practices including Azure RBAC, managed identities, Key Vault integration for secret rotation, and network segmentation via private endpoints. You will support our SOC 2 / ISO 27001 compliance efforts by maintaining audit trails and enforcing least-privilege access.
Requirements
- Experience: 5+ years operating production workloads in Microsoft Azure with a strong background in Infrastructure as Code (Terraform).
- CI/CD Expertise: Proven ability to build, secure, and troubleshoot GitHub Actions pipelines.
- Azure Services: Deep familiarity with networking, DNS, Identity/RBAC, Key Vault, Container Apps, PostgreSQL, and Service Bus.
- AI Operations: Experience operating AI-enabled applications is highly preferred. Knowledge of Azure AI Foundry and LLM operational metrics (latency, quota usage) is a significant plus.
- Scripting & Tools: Comfortable scripting in Bash/PowerShell/Python for automation tasks.
Company
Embrace Software Inc
Embrace is a leading venture builder that acquires and invests in niche software companies providing industry-specific solutions across six regulated verticals, including financial services, healthcar...