Required Skills & Experience
- 7+ years of hands-on experience in DevOps and Cloud technologies
- Deep expertise in at least one major cloud platform: AWS, Azure, or GCP
- Proven experience in managing high-scale production workloads
- Extensive, hands-on experience with Kubernetes in production environments — including deployment, scaling, monitoring, and troubleshooting
- Strong knowledge of infrastructure automation tools such as Terraform, Pulumi, CloudFormation, and Helm
- Solid debugging and troubleshooting skills across infrastructure and applications
- Deep working knowledge of Linux systems and networking concepts
- Proficiency in at least one scripting or programming language: Python, Shell, Go, or Java
- Experience with monitoring and observability tools like DataDog, New Relic, ELK, Prometheus/Grafana
- Familiarity with modern cloud-native architecture patterns including microservices, RESTful APIs, etc.
- Strong problem-solving mindset and the ability to thrive in a fast-paced, dynamic environment
Key Responsibilities
- Design and implement secure, resilient, and highly scalable infrastructure
- Lead infrastructure automation, including provisioning, capacity planning, demand forecasting, and cost optimization
- Develop automation tools and frameworks to improve system observability, reliability, availability, and performance
- Ensure application scalability and performance by adhering to cloud-native architecture best practices
- Set high standards for engineering through code reviews, documentation, and building self-service automation tools
- Participate in and drive blameless postmortems and sustainable incident response practices