This role is for one of the Weekday's clients
Min Experience: 8 years
Location: Mumbai
JobType: full-time
We are looking for an experienced
Infrastructure Lead to drive the design, implementation, and optimization of scalable, secure, and highly available cloud infrastructure. This role will lead DevOps/SRE initiatives, establish best practices, and ensure reliability and performance of mission-critical systems.
Requirements
Key Responsibilities
- Cloud Infrastructure & Architecture
- Design, develop, and maintain scalable cloud infrastructure on AWS and Azure platforms
- Lead architectural decisions to ensure high availability, fault tolerance, and optimal performance
- Promote infrastructure automation through Infrastructure as Code (Terraform)
- DevOps & CI/CD Enablement
- Develop and enhance CI/CD pipelines using tools such as Jenkins, GitLab CI, CircleCI, and ArgoCD
- Adopt GitOps methodologies for consistent and dependable deployments
- Increase deployment frequency, shorten lead times, and reduce failure rates
- Kubernetes & Containerization
- Oversee and scale Kubernetes clusters across EKS, AKS, and on-premises environments
- Implement container orchestration, service mesh solutions, and cluster optimization techniques
- Ensure platform reliability and conduct performance tuning
- Monitoring, Reliability & Incident Management
- Establish and uphold SLOs, SLAs, and reliability benchmarks
- Deploy observability tools such as Prometheus, Grafana, Datadog, and ELK stack
- Lead incident management processes including root cause analysis and reducing mean time to recovery (MTTR)
- Automation & Operational Excellence
- Promote automation across infrastructure provisioning, monitoring, and recovery workflows
- Create reusable infrastructure modules and accelerators
- Minimize manual tasks through scripting using Python and Bash, along with supporting tools
- Security & Compliance
- Apply cloud security best practices involving IAM, network security, and policy enforcement
- Maintain compliance via Kubernetes policies and governance frameworks
- Champion secure-by-design principles in infrastructure development
- Cost Optimization
- Monitor cloud resource consumption and implement cost-saving strategies
- Utilize right-sizing, auto-scaling, and efficient resource utilization methods
- Leadership & Stakeholder Management
- Lead and mentor DevOps and SRE teams
- Collaborate effectively with engineering, product, and architecture teams
- Promote infrastructure best practices across various projects and teams
- Innovation & AI-driven Operations (Preferred)
- Explore AI and machine learning-driven infrastructure enhancements and AIOps capabilities
- Implement intelligent monitoring, anomaly detection, and automate root cause analysis
Required Skills & Experience
- At least 8 years of experience in Infrastructure, DevOps, or SRE roles
- Strong expertise in AWS (preferred)
- Hands-on experience with Terraform (Infrastructure as Code)
- Comprehensive knowledge of Kubernetes and containerization (Docker)
- Experience working with CI/CD tools such as Jenkins, GitLab CI, CircleCI, and ArgoCD
- Strong understanding of monitoring and observability tools
- Proficient in scripting languages including Python and Bash
- Experience managing high-availability, large-scale systems
Skills
Infrastructure as code
Lead Infrastructure
DevOps
SRE
Terraform
Kubernetes
Docker
CI CD