Position Overview
We are a global enterprise technology company building innovative cloud-based platforms and AI-driven solutions. The organization is expanding its engineering team in India and is looking for an experienced Azure DevOps Engineer to support cloud infrastructure, CI/CD automation, platform reliability, and AI/ML workload enablement.
This is a hands-on engineering role responsible for designing, implementing, securing, and operating cloud platforms that support enterprise applications, data platforms, and AI/GenAI workloads in production environments.
The ideal candidate will have strong experience in cloud infrastructure engineering, automation, DevOps practices, and production operations. The role requires expertise in building scalable cloud environments, implementing deployment automation, improving reliability, and partnering with global engineering teams.
Key Responsibilities
Cloud Platform & Infrastructure Engineering
- Design, provision, and manage secure, scalable cloud environments across Development, Testing, and Production.
- Build and maintain cloud infrastructure using Infrastructure-as-Code tools such as Terraform, Bicep, or equivalent technologies.
- Implement cloud security best practices including identity management, RBAC, secrets management, network security, and compliance controls.
- Optimize cloud infrastructure performance, reliability, and cost efficiency.
CI/CD & Automation
- Design, implement, and maintain CI/CD pipelines supporting application development, infrastructure deployments, data workflows, and AI/ML workloads.
- Automate build, testing, deployment, and release processes to improve engineering efficiency.
- Implement deployment strategies including blue/green deployments, canary releases, and automated rollback mechanisms.
- Drive standardization and automation across engineering environments.
Reliability, Monitoring & Operations
- Implement monitoring, logging, tracing, and alerting solutions for distributed applications and cloud platforms.
- Support production operations including incident management, troubleshooting, root cause analysis, and continuous improvement.
- Define and improve operational metrics including availability, performance, and service reliability.
- Support production readiness activities for new applications and platform enhancements.
AI Platform & MLOps Enablement
- Support deployment, scaling, and lifecycle management of AI models and AI-powered applications.
- Build and maintain MLOps workflows supporting model deployment, monitoring, and version management.
- Implement observability for AI workloads including performance monitoring, usage tracking, and reliability metrics.
- Support secure and compliant operation of AI/GenAI platforms.
Collaboration & Technical Leadership
- Partner with global engineering, application development, data, AI, and security teams.
- Translate business and technical requirements into scalable infrastructure solutions.
- Communicate technical decisions, risks, and operational updates effectively.
- Provide technical guidance and support to engineering teams.
Required Qualifications & Experience
- Bachelor's degree in Computer Science, Engineering, or related technical discipline, or equivalent practical experience.
- 5+ years of Strong experience designing, implementing, and supporting production-grade cloud infrastructure.
- Hands-on experience with Microsoft Azure preferred; candidates with strong AWS or GCP experience are also encouraged to apply.
- Experience with cloud services including compute, networking, storage, identity management, and security.
- Strong understanding of CI/CD pipeline design and DevOps automation practices.
- Experience with Infrastructure-as-Code tools such as Terraform, Bicep, or similar technologies.
- Experience with container technologies including Docker and Kubernetes.
- Strong knowledge of monitoring, logging, alerting, and production troubleshooting.
- Experience with cloud security practices including access controls, secrets management, and compliance requirements.
- Ability to independently troubleshoot complex infrastructure and application issues.
- Experience working with global engineering teams in a distributed environment.
Preferred Qualifications
- Experience supporting AI/ML workloads and MLOps platforms.
- Experience with Azure AI services, Azure Machine Learning, or equivalent cloud AI platforms.
- Experience implementing observability for AI systems including latency, throughput, and model performance monitoring.
- Knowledge of responsible AI, governance, and enterprise AI compliance practices.
- Experience optimizing cloud infrastructure costs and performance.
- Familiarity with AI coding assistants and modern engineering productivity tools.
Technical Skills
Cloud Platforms: Azure (preferred), AWS, GCP
Infrastructure as Code: Terraform, Bicep, ARM Templates
CI/CD: Azure DevOps, GitHub Actions, Jenkins, or equivalent
Containers: Docker, Kubernetes, AKS/EKS/GKE
Monitoring: Application Insights, Azure Monitor, Prometheus, Grafana, ELK, or equivalent
Security: IAM, RBAC, Secrets Management, Network Security
AI/MLOps: Model deployment pipelines, AI workload monitoring, ML lifecycle management