Our client is a fast-growing AI technology company redefining how large-scale dynamic pricing is handled in real-time. The company is partnering with some of the world’s leading airlines to transform how pricing is done at scale. They've developed a cutting-edge platform that enables enterprise clients to move beyond manual pricing models and embrace a fully autonomous, AI-driven system that dynamically adjusts to real-time market conditions.
We are actively looking g for a DevOps Engineer to join the global team, and being a key member to monitor and maintaining the cloud-native infrastructure that supports highly scalable, AI-powered applications. You’ll work in a DevOps team responsible for deploying, monitoring, and automating systems across a Google Cloud Platform environment, collaborating closely with engineering teams working on advanced AI-powered projects.
This role is ideal for engineers who think about infrastructure from a reliability-first mindset and thrive in environments where experimentation, automation, and data-driven decision-making are embedded into the culture.
Responsibilities
- Own availability and reliability of production services across GCP and Kubernetes.
- Build and maintain infrastructure-as-code using Terraform and Helm (custom charts).
- Design and improve monitoring and alerting systems using Prometheus, Grafana, and logging tools.
- Collaborate with developers to enforce best practices in performance, observability, and deployment.
- Improve deployment and rollback workflows via GitLab CI/CD and ArgoCD.
- Write and maintain Python scripts and tools to automate operational tasks.
- Contribute to incident management processes and on-call rotations.
Requirements
- 4+ years of experience in SRE, production engineering, or DevOps in high-availability systems.
- 2+ years working with Google Cloud Platform (GCP) in production environments.
- 2+ years of hands-on experience with Kubernetes (preferably managing multi-tenant clusters).
- Strong experience with Terraform and Helm (preferably writing custom charts).
- Proficiency in Python scripting for tooling, monitoring, or operational automation.
- Experience with Prometheus, Grafana, and alerting best practices.
Advantages:
- Experience managing GitOps with ArgoCD or similar tools.
- Understanding of Kubernetes Operators and controller logic.
- Python proficiency for coding beyond scripts (small services, integrations).
- Familiarity with GCP services such as GKE, BigQuery, Dataflow, or VertexAI.
The company is committed to creating a diverse environment and is proud to be an equal-opportunity employer. They provide a collaborative working environment along with resources, and state-of-the-art tools & equipment to promote success; and a welcoming, inclusive corporate culture where individuals are recognized for their contributions.
This is a hybrid role, with 1–2 days a week in the Miami office. The company is open and flexible for the right person