About HyperFi
We're building the kind of platform we always wanted to use: fast, flexible, and built for making sense of real-world complexity. Behind the scenes is a robust, event-driven architecture that connects systems, abstracts messy workflows, and leaves room for smart automation. The surface is clean and simple. The interactions are seamless and intuitive. The machinery underneath is anything but. That’s where you come in.
We’re a well-networked founding team with strong execution roots and a clear roadmap. We’re backed, focused, and delivering fast.
We're looking for a
DevOps Engineer / Site Reliability Engineer to join early. Someone who knows what “production ready” actually means — and who can help us get there. You’ll work closely with the Tech Lead and CTO to shape our infrastructure, observability, and deployment strategy. This is a zero-legacy environment: clean slate, fast moves, and real input on how we build and scale.
You have strong opinions? We want to hear them.
💥 What You’ll Do
- Own the Terraform stack for GCP — provisioning everything from services to secrets
- Set up and evolve CI/CD pipelines (currently GitHub Actions)
- Define and deploy our observability stack (metrics, logs, traces, alerts)
- Drive reliability practices: health checks, graceful degradation, rollbacks
- Help build out lower environments and smooth the local dev experience
- Work closely with the engineering team to make infrastructure decisions that unlock velocity, not block it
🧰 Tech Stack (So Far)
- Terraform (core infrastructure provisioning)
- GCP (GKE, Cloud Run, Pub/Sub, CloudSQL, Secrets Manager, etc.)
- GitHub Actions (CI/CD)
- Python + React services (you’ll help deploy them)
- Postgres, Databricks, message queue
You’ll have a strong hand in choosing how we monitor, alert, and observe.
✅ What We’re Looking For
- 6–8 years of DevOps, SRE, or platform engineering experience
- Expert-level Terraform and deep comfort operating in GCP
- Strong instincts around infrastructure-as-code, secrets management, and security best practices
- Experience owning CI/CD pipelines and deploy orchestration in production systems
- Familiarity with microservice observability patterns — and an opinion on what stack to use
- Startup-ready mindset: lean, pragmatic, and comfortable with ambiguity
🔥 Bonus If You
- Have built out metrics/alerts with Prometheus, Grafana, OpenTelemetry, or equivalent
- Have experience creating ephemeral environments for preview or QA
- Are comfortable pairing with engineers to improve DX and CI loops
- Have run load and failover tests — and used the results to make the system better
- Can show us a Terraform module, a CLI tool, or an outage retro you’re proud of
📍 Location & Compensation
- Must be based in San Francisco, Las Vegas, or Tel Aviv
- Full-time role with competitive comp
- Flexible hours, async-friendly culture, engineering-led environment