Site Reliability Engineer

Berkley Hunt • Full-time • New York, United States, US • $110,000 - $150,000 / year • 6d ago

Site Reliability Engineer | Hybrid | Fintech

A fast-growing, VC-backed Series C who are building globally distributed cloud infrastructure that powers reliable, large-scale systems. The team focuses on turning complex, distributed technology into resilient, highly available platforms used around the world. Engineering is fully collaborative across regions, with a culture of ownership and automation.

The Role

We’re looking for a Senior Site Reliability Engineer to join the team and take ownership of the systems that keep the platform running smoothly. You’ll work closely with engineering teams to ensure reliability, scalability, and operational excellence, guiding decisions that affect performance and uptime across the entire stack.

Key Responsibilities:

Design, build, and maintain highly available infrastructure using IaC (Terraform preferred)
Manage Kubernetes clusters and streamline deployments with Helm
Operate and troubleshoot Linux-based systems, ensuring security and stability
Build automation tools in Python or Go to improve efficiency and reduce manual work
Implement robust observability practices - monitoring, logging, and alerting
Participate in production support and 24/7 on-call rotation
Collaborate across globally distributed engineering teams to shape architectural decisions

What You Bring:

5+ years of experience with cloud-native infrastructure, IaC, and Linux administration
Deep understanding of Kubernetes, Helm, and containerized environments
Hands-on experience with GCP and AWS, designing scalable cloud solutions
Experience implementing observability solutions and automating operational workflows
Strong troubleshooting skills in networking, systems, and distributed environments
A proactive, ownership-driven mindset - you build it, you run it
Excellent communication skills and ability to collaborate across time zones