Lead DevOps Engineer / Senior DevOps Engineer
About Us
Restroworks is a leading cloud-based restaurant technology platform powering 20,000+ restaurants across 50+ countries. Our unified technology platform helps enterprise restaurant operators scale efficiently, improve profitability, and deliver consistent guest experiences across locations.
Our product suite includes Point of Sale (POS), Inventory Management, Integrations, Analytics, CRM, and other mission-critical restaurant operations solutions. Global restaurant brands such as Subway, Taco Bell, Nando’s, Caribou Coffee, Carl’s Jr., and Häagen-Dazs trust Restroworks to streamline and manage their operations.
Restroworks has been recognized as a Global Leader in Restaurant Management Software by G2 and is proudly Great Place to Work-Certified™.
Role: Lead DevOps Engineer / Senior DevOps Engineer
We are looking for a highly skilled and driven Lead DevOps Engineer / Senior DevOps Engineer to join our growing engineering team. The ideal candidate will have deep expertise in cloud infrastructure, automation, CI/CD, scalability, reliability, and security practices, along with the ability to lead infrastructure initiatives in a fast-paced SaaS environment.
This role requires someone who can architect, build, and optimize highly available and scalable systems while collaborating closely with engineering, product, and security teams.
Key Responsibilities
- Design, implement, and manage scalable, secure, and highly available cloud infrastructure.
- Build and optimize CI/CD pipelines to improve deployment efficiency and release reliability.
- Automate infrastructure provisioning and configuration management using Infrastructure as Code (IaC) tools.
- Monitor system performance, uptime, and reliability across production environments.
- Lead DevOps best practices around deployment, observability, incident management, and disaster recovery.
- Collaborate with engineering teams to improve application scalability, availability, and performance.
- Manage Kubernetes clusters, containerized applications, and orchestration environments.
- Implement security best practices across infrastructure and deployment pipelines.
- Drive cost optimization initiatives for cloud infrastructure and services.
- Mentor junior DevOps engineers and contribute to building a high-performance engineering culture.
- Participate in on-call rotations and production incident resolution.
- Design and manage Kafka/MSK-based event streaming systems for high-throughput microservices communication.
- Support scalable distributed architectures handling billions of transactions/messages monthly.
- Implement observability and monitoring solutions using Prometheus, Grafana, CloudWatch, Datadog, New Relic, Site24x7, and related tools.
- Lead incident response, RCA analysis, platform uptime initiatives, and SRE best practices.
- Implement DevSecOps and cloud security best practices using tools such as Wiz, Lacework, Snyk, SonarQube, JFrog X-Ray, AWS native security services, and F5.
- Conduct regular infrastructure security audits, vulnerability management, governance reviews, and compliance checks. Exp in ISO, SOC 1 , SOC 2 audits will be added advantage.
- Manage infrastructure for data migration initiatives using AWS DMS and related technologies.
- Support data pipelines, integrations, and enterprise migration projects.
- Drive AI-enabled DevOps automation for environment provisioning, log analysis, incident triaging, and operational efficiency improvements.
Required Skills & Experience
- 6–10+ years of experience in DevOps, Site Reliability Engineering (SRE), or Infrastructure Engineering roles.
- Strong hands-on experience with multiple cloud platforms such as AWS, Azure, OCI or GCP
- Expertise in containerization and orchestration technologies like Docker and Kubernetes.
- Strong experience with CI/CD tools such as Jenkins, GitHub Actions, GitLab CI/CD, or ArgoCD.
- Experience with Infrastructure as Code tools like Terraform, CloudFormation, or Ansible.
- Strong understanding of Linux systems, networking, and security fundamentals.
- Experience with monitoring and logging tools such as Prometheus, Grafana, ELK, Datadog, or New Relic.
- Hands-on scripting experience in Bash, Python, or Shell scripting.
- Strong understanding of microservices architecture and distributed systems.
- Experience working in high-scale SaaS or product-based technology environments.
- Excellent troubleshooting, analytical, and problem-solving skills.
- Strong communication and stakeholder management abilities.
Good to Have
- Experience with multi-cloud environments.
- Exposure to DevSecOps practices and security automation.
- Experience managing large-scale distributed systems and database infrastructure.
- Kubernetes certifications (CKA/CKAD) or AWS certifications.
- Experience in high-transaction B2B SaaS/product organizations.
Why Join Restroworks?
- Opportunity to work with a global SaaS product company serving leading restaurant brands worldwide.
- Fast-paced and innovation-driven engineering culture.
- Ownership-driven environment with high growth opportunities.
- Collaborative leadership and open work culture.
- Exposure to large-scale, high-availability systems and global deployments.
- Competitive compensation and benefits.