Who We Are
Osmos, a B2B SaaS company founded by ex-Amazon ad-tech experts, is revolutionizing retail media with an AI-powered operating system that increases retailer profitability (by up to 7% of sales) and delivers superior ROAS for brands. By enabling Tier 1 retailers and marketplaces worldwide to activate more brands and leverage advanced targeting, we help them secure a lasting competitive edge.
Your Impact
We are seeking a highly skilled Staff DevOps Engineer to architect and maintain a highly available, global infrastructure capable of handling high QPS systems with 99.99% uptime. The role requires expertise in managing deployments across multiple regions, ensuring fault-tolerant systems, and driving scalability for mission-critical applications.
What You'll Do
- Deploy, manage, and scale Kubernetes clusters for high throughput and low latency across multiple regions.
- Implement and maintain Infrastructure as Code (IaC) to build fault-tolerant, globally distributed systems.
- Build and optimize CI/CD pipelines to enable seamless, zero-downtime deployments.
- Ensure high availability (99.99%) for mission-critical, high QPS applications through monitoring, alerting, and incident management practices.
- Support and maintain multi-region deployments to achieve low-latency and geo-redundant infrastructure.
- Collaborate with engineering, product, and security teams to ensure scalability, security, and operational efficiency.
- Contribute to continuous improvement by automating workflows, reducing toil, and sharing best practices.
You'll Thrive If You Have
- 3–7 years of experience managing large-scale, high-availability systems in production.
- Strong expertise in Kubernetes administration, including scaling clusters and handling multi-region workloads.
- Hands-on experience with IaC tools like Terraform or CloudFormation.
- Proficiency in designing and maintaining CI/CD pipelines for global deployments.
- Solid knowledge of cloud platforms (AWS, GCP, or Azure) and geo-redundant architectures.
- Proficient in Linux administration, scripting (Bash, Python), and debugging distributed systems.
- A problem-solving mindset with the ability to troubleshoot complex production issues effectively.
Why Choose Osmos?
- Startup Energy, Enterprise Scale: Fast-paced innovation with global ambition
- Revolutionize Retail Marketing: Be at the forefront of AI-powered solutions
- Meaningful Contribution: Directly impacts major brands' success
- No Red Tape: Autonomy and empowerment to drive results
- Growth & Fun: Continuous learning in a vibrant, collaborative culture
- Competitive Rewards: We value your expertise and offer strong compensation
Ready to champion Infra & Cloud? Let's chat.
Quirky & fun. Enjoy new skills and hobbies like being a quiz master, playing board games, trying your hands on percussion, playing Djembe, and spreading love within the org!