Site Reliability Engineer | Hybrid | Fintech
A fast-growing, VC-backed Series C who are building globally distributed cloud infrastructure that powers reliable, large-scale systems. The team focuses on turning complex, distributed technology into resilient, highly available platforms used around the world. Engineering is fully collaborative across regions, with a culture of ownership and automation.
The Role
We’re looking for a Senior Site Reliability Engineer to join the team and take ownership of the systems that keep the platform running smoothly. You’ll work closely with engineering teams to ensure reliability, scalability, and operational excellence, guiding decisions that affect performance and uptime across the entire stack.
Key Responsibilities:
- Design, build, and maintain highly available infrastructure using IaC (Terraform preferred)
- Manage Kubernetes clusters and streamline deployments with Helm
- Operate and troubleshoot Linux-based systems, ensuring security and stability
- Build automation tools in Python or Go to improve efficiency and reduce manual work
- Implement robust observability practices - monitoring, logging, and alerting
- Participate in production support and 24/7 on-call rotation
- Collaborate across globally distributed engineering teams to shape architectural decisions
What You Bring:
- 5+ years of experience with cloud-native infrastructure, IaC, and Linux administration
- Deep understanding of Kubernetes, Helm, and containerized environments
- Hands-on experience with GCP and AWS, designing scalable cloud solutions
- Experience implementing observability solutions and automating operational workflows
- Strong troubleshooting skills in networking, systems, and distributed environments
- A proactive, ownership-driven mindset - you build it, you run it
- Excellent communication skills and ability to collaborate across time zones