About the job
Job Specification: Lead DevOps Engineer - Azure
Job Title:Location:Work Hours:Experience Required:
Role Summary:
Lead DevOps Engineer
Key Responsibilities:
· Design, implement, and maintain IaC using Terraform for scalable and efficient infrastructure management.
· Manage and optimize Azure services, including Web Apps, App Services, Front Door, API Management, Redis Cache, Cosmos Postgres, Cosmos MongoDB, AI Search Index, Event Hub, Azure Functions, and Key Vault.
· Implement cost management solutions and drive Azure cloud spend optimization.
· Extensive experience with containerization technologies such as Kubernetes (AKS) and Istio for service mesh management.
· Implement and maintain CI/CD pipelines using GitHub Actions for streamlined container deployments.
· Lead the transition from App Insights to Splunk Observability for application performance monitoring and troubleshooting.
· Implement logging and alerting mechanisms for proactive incident management using tools.
· Design and execute disaster recovery failover and failback processes to ensure business continuity.
· Drive chaos engineering practices to test and improve system resilience.
· Enhance container security by integrating CrowdStrike for robust threat detection and mitigation.
· Ensure adherence to cloud security best practices for infrastructure and applications.
· Implement Blue-Green deployments and manage API versioning for seamless application updates.
· Provide production deployment and incident support for both production and non-production environments.
· Optimize database performance, including transitioning RU-based MongoDB to vCore-based Cosmos DB.
· Maintain and enhance database reliability and scalability for multi-tenant environments.
· Work closely with cross-functional teams to decommission legacy infrastructure and support ephemeral environments for testing.
· Collaborate with teams using Jira and Confluence to streamline DevOps processes and ensure effective documentation.
· Manage peak event capacity planning to ensure high availability during critical business periods.
· Optimize cloud resource usage and costs through strategic planning and automation.
Required Qualifications:
· 9+ years of experience in DevOps roles, with 4+ years working on Azure cloud services.
· Extensive hands-on experience with Terraform for Infrastructure as Code.
· Strong knowledge of AKS, Web Apps, App Services, and related Azure technologies.
· Proficient with Front Door, API Management, Redis Cache, Cosmos DB (Postgres/MongoDB), AI Search Index, Event Hub, Azure Functions, and Key Vault.
· Skilled in GitHub, GitHub Actions, and automation pipelines.
· Hands-on experience with Kubernetes, Istio, and container security tools like CrowdStrike.
· Experience transitioning observability tools (e.g., App Insights to Splunk) and configuring OpsGenie alerts.
· Familiarity with CloudBolt for cloud spend optimization.
Preferred Qualifications:
· Experience with incident management and support in production and non-production environments.
· Hands-on experience with tools like Jira, Confluence, and OpsGenie.
· Exposure to advanced DevOps practices like chaos engineering and ephemeral environments.
Key Initiatives Led:
· Disaster recovery failover and failback.
· Multi-tenant shared infrastructure management in production.
· Database optimization (e.g., RU-based MongoDB to vCore).
· Splunk implementation and App Insights decommissioning.
· Cloud security for containers and legacy infrastructure decommissioning.
· Blue-Green deployments and API version management.
· Peak event capacity management and Azure cloud spend optimization.
Soft Skills:
· Strong problem-solving and troubleshooting skills.
· Excellent communication and collaboration with cross-functional teams.
· Ownership mindset with a proactive approach to addressing challenges.