```html
About the Company: UptimeAI is leading the way in predictive analytics and AI-driven solutions to optimize operational uptime and reduce downtime for industrial and enterprise clients. Our innovative platform harnesses cutting-edge data science to deliver actionable insights, ensuring maximum efficiency and reliability. UptimeAI uniquely combines Artificial Intelligence with Subject Matter Knowledge from 200+ years of cumulative experience to explain interrelations across upstream/downstream equipment, adapt to changes, identify problems, and give prescriptive diagnosis like a human expert would.
About the Role: We are a fast-growing, AI-first SaaS startup backed by top-tier investors and operating across India and the US. Our platform helps enterprises optimize critical business functions using cutting-edge AI and automation. As we scale, we’re looking for a hands-on DevOps Engineer who thrives in startup environments and can take ownership of cloud infrastructure, deployment, and CI/CD workflows.
Responsibilities:
- Design, implement, and manage cloud infrastructure across AWS and Azure for both internal platforms and customer-specific deployments
- Configure and maintain VPCs, VPNs, and peering to enable secure, scalable, and isolated environments
- Build and automate CI/CD pipelines for application and ML workloads
- Manage multi-tenant vs single-tenant deployments based on customer requirements
- Implement monitoring, alerting, logging, and disaster recovery strategies
- Work closely with engineering to ensure seamless Dev→Prod flows and secure release management
- Set up and manage infrastructure as code (e.g., Terraform, Pulumi, Bicep, CloudFormation)
- Optimize costs, performance, and availability for both internal and customer-facing cloud workloads
- Enforce security best practices, access control, and compliance across infrastructure
Qualifications:
- 6+ years of experience as a DevOps/SRE/Cloud Engineer in high-growth SaaS or product startups
- AWS Certified (at least Solutions Architect - Associate) and Azure Certified (e.g., AZ-104 or higher)
- Strong experience with AWS and Azure networking, including: VPC, VPNs, Subnets, Route Tables, Security Groups, NAT Gateways
- Site-to-site VPN setups for enterprise customers
- Proven experience deploying applications to customer-controlled cloud environments (BYOC) and company-controlled SaaS environments
- Expertise with tools like: CI/CD: GitHub Actions, GitLab CI, Azure Pipelines; IaC: Terraform, Bicep, or Pulumi; Containers: Docker, Kubernetes (EKS/AKS preferred)
- Familiarity with Secrets Management, IAM, Role-based Access Control, and SSO/SAML integration
- Strong scripting skills in Bash, Python, or PowerShell
- Comfortable working in a fast-paced, ambiguous startup environment
Required Skills:
- Experience with AI/ML pipeline deployment or GPU workloads
- Exposure to SOC2, ISO27001, or GDPR compliance in a cloud environment
- Familiarity with tools like Prometheus, Grafana, Datadog, ELK, or Azure Monitor
Pay range and compensation package: Not specified in the provided job description.
Equal Opportunity Statement: UptimeAI is committed to diversity and inclusivity in the workplace.
Why to join UptimeAI:
- Impact Industry-Wide Change: Contribute to transformative solutions that significantly improve operational efficiency and reliability for global clients.
- Collaborative and Growth-Oriented Environment: Join a talented, passionate team that values innovation, continuous learning, and professional growth.
- Opportunities for Leadership and Innovation: Lead pioneering projects, influence product development, and shape the future of industrial AI solutions.
```