DevOps Engineer / Site Reliability Engineer

HyperFi • Full-time • San Francisco, CA, US • 3w ago

About HyperFi

We're building the kind of platform we always wanted to use: fast, flexible, and built for making sense of real-world complexity. Behind the scenes is a robust, event-driven architecture that connects systems, abstracts messy workflows, and leaves room for smart automation. The surface is clean and simple. The interactions are seamless and intuitive. The machinery underneath is anything but. That’s where you come in.

We’re a well-networked founding team with strong execution roots and a clear roadmap. We’re backed, focused, and delivering fast.

We're looking for a DevOps Engineer / Site Reliability Engineer to join early. Someone who knows what “production ready” actually means — and who can help us get there. You’ll work closely with the Tech Lead and CTO to shape our infrastructure, observability, and deployment strategy. This is a zero-legacy environment: clean slate, fast moves, and real input on how we build and scale.

You have strong opinions? We want to hear them.

💥 What You’ll Do

Own the Terraform stack for GCP — provisioning everything from services to secrets
Set up and evolve CI/CD pipelines (currently GitHub Actions)
Define and deploy our observability stack (metrics, logs, traces, alerts)
Drive reliability practices: health checks, graceful degradation, rollbacks
Help build out lower environments and smooth the local dev experience
Work closely with the engineering team to make infrastructure decisions that unlock velocity, not block it

🧰 Tech Stack (So Far)

Terraform (core infrastructure provisioning)
GCP (GKE, Cloud Run, Pub/Sub, CloudSQL, Secrets Manager, etc.)
GitHub Actions (CI/CD)
Python + React services (you’ll help deploy them)
Postgres, Databricks, message queue

You’ll have a strong hand in choosing how we monitor, alert, and observe.

✅ What We’re Looking For

6–8 years of DevOps, SRE, or platform engineering experience
Expert-level Terraform and deep comfort operating in GCP
Strong instincts around infrastructure-as-code, secrets management, and security best practices
Experience owning CI/CD pipelines and deploy orchestration in production systems
Familiarity with microservice observability patterns — and an opinion on what stack to use
Startup-ready mindset: lean, pragmatic, and comfortable with ambiguity

🔥 Bonus If You

Have built out metrics/alerts with Prometheus, Grafana, OpenTelemetry, or equivalent
Have experience creating ephemeral environments for preview or QA
Are comfortable pairing with engineers to improve DX and CI loops
Have run load and failover tests — and used the results to make the system better
Can show us a Terraform module, a CLI tool, or an outage retro you’re proud of

📍 Location & Compensation

Must be based in San Francisco, Las Vegas, or Tel Aviv
Full-time role with competitive comp
Flexible hours, async-friendly culture, engineering-led environment

Apply

Software Engineer, Reliability

OpenAI • Full-time • San Francisco, CA, US • 2w ago

Cloud Engineer

2w ago

Apply