We are looking for a Senior DevOps Engineer (7+ years) to own and scale our cloud and DevOps infrastructure as we expand our AI and SaaS platforms. This is a hands-on role for someone who can design, build, and operate highly available, secure, and scalable systems at production scale.
Responsibilities:
- Design, build, and manage scalable cloud infrastructure on AWS (primary), with exposure to GCP and Azure.
- Architect auto-scaling, high-availability (HA), and disaster recovery (DR) systems with defined RPO/RTO.
- Design multi-region network architectures (VPCs, subnets, peering, NAT, trust zones).
- Implement Infrastructure as Code (IaC) for automated provisioning and management.
- Set up, manage, and monitor large-scale Kubernetes clusters.
- Manage persistent volumes, networking, and Kubernetes network policies at scale.
- Ensure best practices for reliability, security, and performance.
- Design and maintain CI/CD pipelines to improve developer productivity.
- Track and improve DevOps metrics (e. g., DORA).
- Implement observability and monitoring for system health and performance.
- Troubleshoot critical production issues and drive long-term fixes.
- Integrate security practices (SAST, DAST, etc. ) into CI/CD workflows.
- Design systems aligned with compliance standards, manage audits and renewals.
- Work closely with engineering teams to align DevOps strategy with product goals.
- Provide technical guidance and best practices for deployment and scaling.
Requirements:
- CKA certification preferred.
- 7+ years of experience in DevOps / Cloud / Infrastructure roles.
- Strong hands-on expertise across AWS, GCP, and Azure.
- Proven experience designing HA, DR, and large-scale distributed systems.
- Deep understanding of Kubernetes, CI/CD, and cloud networking.
- Strong troubleshooting skills with a root-cause analysis mindset.
- Experience in AI, SaaS, or mid-stage startups is a strong plus.
- Ability to lead by example in hands-on environments.
Regards,
TA Team