We are looking for an experienced Cloud Infrastructure Transformation Lead to manage and guide team of Site Reliability Engineers (SREs) with strong engineering and architectural expertise to design, implement, and manage large-scale, mission-critical infrastructure across multiple data centers and cloud providers.
As a Cloud Lead, you will be responsible for architecting and optimizing our global infrastructure, enabling development teams to roll out new features efficiently while maintaining high availability and reliability. You will be hands-on with automation, performance tuning, infrastructure scalability, and cloud-native technologies to ensure a seamless user experience for millions of customers.
Key Responsibilities
- Architect and implement highly scalable, fault-tolerant, and distributed systems across multi-cloud (OCI, AWS, GCP) and on-premise environments using modern DevOps and SRE principles.
- Design and deploy next-generation cloud infrastructure with a strong focus on automation, self-healing systems, and performance optimization.
- Develop and maintain infrastructure-as-code (IaC) using Terraform and configuration management tools such as Ansible and Puppet for automated provisioning and orchestration.
- Build and optimize containerized environments using Kubernetes and Docker for seamless deployment and scaling.
- Drive performance, scalability, and security improvements across our cloud and on-prem infrastructure, ensuring high availability and disaster recovery capabilities.
- Monitor, troubleshoot, and resolve complex system issues by implementing advanced observability solutions, logging, and real-time monitoring frameworks.
- Develop and enforce SRE best practices, including SLI/SLO definition, capacity planning, and incident management strategies.
- Eliminate toil and automate repetitive tasks using scripting languages such as Python, Golang, or Shell scripting to improve operational efficiency.
- Collaborate closely with engineering, architecture, and security teams to improve system resiliency, optimize application performance, and streamline CI/CD workflows.
- Lead the transition of legacy systems to modern, cloud-native architectures, advocating for DevOps and infrastructure automation.
Requirements
- 10+ years of hands-on experience with Cloud Infrastructure Transformation Lead role, with a strong focus on designing, implementing, and managing cloud-native infrastructure.
- Proficient with any cloud platform (preferably OCI) —not just operational experience but actual design, architecture solutioning and implementation expertise.
- Proven experience in building, deploying, and optimizing infrastructure-as-code (IaC) using Terraform.
- Strong automation mindset with proficiency in Ansible, Puppet, or other configuration management tools.
- Hands-on experience with container orchestration using Kubernetes, Docker, and microservices architecture.
- Advanced scripting and automation skills in Python, Golang, or Shell scripting to eliminate manual operations.
- Working knowledge of load balancing technologies (HAProxy, Nginx, F5, Varnish, dnsdist) and web servers (Apache, Nginx).
- Strong understanding of networking, distributed systems, and observability tools (Prometheus, Grafana, ELK stack, Datadog).
- Experience in designing and implementing highly available, scalable, and secure architectures across cloud and hybrid environments.
- AWS and/or GCP certifications are a plus but not required.
- This is not a support-focused role—we are looking for engineers who have built, deployed, and optimized complex distributed systems from the ground up.