
Senior Platform Engineer
Responsibilities
Qualifications & Requirements
Experience Level: Senior Level
Full Job Description
Senior Platform Engineer - AI & Cloud at BP, Pune, India
BP is undergoing a significant transformation, aiming to become a leading renewable energy provider and achieve net-zero carbon emissions by 2050. To support this transition, we are seeking experienced Senior AI Platform Engineers to architect and maintain the critical infrastructure for AI and ML initiatives across the organization in Pune, India.
This role requires a strong background in Azure and AWS cloud services, Databricks, OpenAI, MLOps best practices, and Azure DevOps. As a key member of our technology team, you will lead the design, development, and optimization of scalable and reliable AI platforms.
Key Responsibilities:
- Deploy and operate cloud infrastructure and automation pipelines for AI/ML and generative AI workloads across Azure, AWS, and Databricks.
- Build secure, scalable environments (VMs, containers, Kubernetes clusters, Databricks workspaces) for deploying cloud AI services like Azure OpenAI/Azure ML, AWS SageMaker/Bedrock.
- Implement continuous integration and deployment (CI/CD) and infrastructure-as-code (ARM/Bicep, AWS CDK) to automate model release cycles.
- Embed security, compliance, monitoring, and observability (logging, Prometheus/Grafana) to ensure service reliability and rapid issue resolution.
- Collaborate with data scientists and application teams to integrate generative AI features (e.g., chatbots, co-pilot assistants) into enterprise applications.
- Lead the design and implementation of scalable AI/ML infrastructure on Azure and AWS.
- Build and manage cloud-native infrastructure (Azure, AWS, Databricks) for AI workloads using Infrastructure-as-Code (IaC) tools like Terraform and Bicep.
- Create reusable self-service tooling, templates, and CI/CD workflows for data scientists and ML engineers.
- Govern AI systems with access control, audit trails, policy enforcement, and compliance monitoring (e.g., GDPR).
- Implement GenAI workloads using Azure AI Foundry, Azure AI Hub, Azure OpenAI, Amazon Bedrock, Anthropic Claude, Hugging Face, LangChain, etc.
- Implement infrastructure and DevOps practices for Agentic AI solutions using native Azure and AWS AI services.
- Collaborate with security and architecture teams to embed cloud security best practices in the AI platform.
- Contribute to incident response, troubleshooting, and root cause analysis of ML and GenAI workload failures and latency issues.
- Implement MLOps practices to manage and optimize the lifecycle of machine learning models, including monitoring, versioning, and retraining.
- Collaborate with data scientists, software engineers, and other stakeholders to ensure effective integration of AI solutions within the business.
- Stay up to date with the latest advancements in AI, cloud computing, and DevOps practices, and integrate relevant technologies into the platform.
- Review weekly/bi-weekly Cloud Cost Reports and lead efforts for cloud cost-savings opportunities.
- Mentor junior engineers, providing technical leadership and fostering a culture of continuous learning.
- Ensure compliance with industry standards and best practices for data security and privacy.
Requirements:
- Bachelor’s or master’s degree or equivalent experience in computer science, engineering, information systems, or a numerate degree.
- 7+ years of experience in platform engineering, with a proven track record of designing, deploying, and managing scalable and secure cloud-based infrastructures, leveraging both Azure and AWS services.
- Experience with Azure services such as Azure AI services, Azure Search, Azure ML, Databricks, Azure Kubernetes Service, and AWS services like AWS SageMaker, AWS Bedrock, and AWS Lambda.
- Exposure to Generative AI and Agentic AI ecosystems such as Azure OpenAI, Azure AI Foundry, Azure AI Hub, Bedrock, Anthropic Claude, OpenAI API, LlamaCloud, LangChain.
- Understanding of token usage, LLM prompt injection risks, Jailbreak attempts, and mitigation techniques.
- Strong knowledge of governance, audit, observability, and compliance in cloud-based GenAI and ML ecosystems.
- Familiarity with Azure AI Evaluation SDK and AI Red Teaming Prompt Security Scans.
- Good to have experience with code assistant tools like Github Copilot, Cursor, and Claude Code.
- Expertise in Azure DevOps or AWS CodePipeline, including setting up and managing CI/CD pipelines.
- Advanced experience with Azure Blob Storage, Cosmos DB, SQL, Key Vault, AWS S3, DynamoDB, and AWS RDS, and their integrations with AI services.
- Advanced understanding of networking concepts, including DNS management, load balancing, VPNs, and virtual networks (VNets).
- Advanced understanding of security concepts, including IAM roles, identities, Azure policies, AWS SCPs.
- Experience in Advanced Authentication and Authorization Concepts across various cloud providers and platforms.
- Must have experience with Azure Policy, AWS SCP, AWS IAM, audit logging, Azure RBAC, etc.
- Mastery of infrastructure-as-code tools such as Azure ARM / Bicep, Terraform, CloudFormation, or equivalent.
- Proficiency in networking, DNS, load balancers, and cloud engineering services.
- Knowledge in Python programming and AI/ML libraries (TensorFlow, PyTorch, Sci-Kit learn, etc.).
- Experience with containerization and orchestration tools such as Docker and Kubernetes.
- Good to have knowledge about Azure Bot framework, APIM, Application Gateway. Also, knowledge about M365 offerings like M365 Copilot. AWS CDK, AWS Python (Boto3) SDK.
- Experience with monitoring tools like Grafana, Prometheus, Application Insights, Log Analytics Workspaces, and Azure Monitor.
- Strong problem-solving and analytical skills.
- Strong communication and collaboration skills to work effectively with diverse teams.
- Proven leadership abilities to guide and mentor junior engineers.
This role is based in Pune, India, and offers a hybrid work arrangement. Relocation assistance within India is available.