Job Details:
Lead Site Reliability Engineer
The Lead Site Reliability Engineer is a senior technical leader responsible for the reliability, availability, and operational excellence of a cloud-based infrastructure and distributed platform. This role owns uptime, SLAs, and incident response while driving long-term improvements in resilience, observability, and automation. The Lead SRE is hands-on and partners closely with platform, QA, and development teams.
This role suits an engineer who thrives in high-ownership environments, balancing real-time operations with strategic reliability initiatives. You’ll define operational standards, disaster recovery practices, and automation frameworks, while leading incidents and postmortems with clarity and accountability.
Key Responsibilities
- Own uptime, SLAs, and overall platform reliability
- Lead incident response, root-cause analysis, and postmortems
- Automate infrastructure, deployments, and operational workflows
- Improve monitoring, alerting, and observability
- Execute and evolve disaster recovery and business continuity plans
- Optimize cloud and Kubernetes environments for scale and performance
- Establish runbooks, operational standards, and reliability best practices
- Provide technical leadership and mentorship
Qualifications
- 6+ years in SRE, DevOps, or Platform Engineering; 2+ years in a lead role
- Strong experience supporting production systems with strict SLAs
- Deep expertise in Kubernetes, containers, and cloud infrastructure
- Proficiency with Terraform and modern IaC practices
- Strong automation and scripting skills (Bash, Python, or Go)
- Experience with CI/CD, GitOps, and observability tooling
- Proven incident leadership and cross-functional communication skills
If you are interested in learning more please send a copy of your resume to jz@libertyjobs.com.
Josh Zeloyle
www.libertyjobs.com
610-684-8676
jz@libertyjobs.com
https://www.linkedin.com/in/joshuazeloyle/
#sre
#devops