Role Description
Job Title: Site Reliability Engineer (SRE)
Experience
6+ Years
Role Summary
Seeking an experienced Site Reliability Engineer to design, build, and operate highly available, scalable, and reliable cloud‑based systems. The role focuses on automation, CI/CD, monitoring, incident management, and improving overall system resilience in distributed environments.
Key Responsibilities
- Manage system uptime, availability, and performance across cloud‑native and hybrid architectures
- Design and implement Infrastructure as Code (IaC) using Terraform
- Build and maintain CI/CD pipelines using Git and Jenkins
- Automate deployments, including blue/green strategies
- Develop automation scripts using Shell or Python
- Implement monitoring, ing, and dashboards for microservices
- Participate in on‑call rotations and handle production incidents
- Lead blameless postmortems and drive preventive actions
- Create and maintain detailed runbooks to reduce MTTR
- Troubleshoot complex distributed systems and service dependencies
Required Skills & Experience
- Strong experience with cloud platforms (AWS / GCP / Azure)
- Hands‑on experience with Terraform and infrastructure automation
- Experience provisioning compute, storage, and networking resources
- Strong knowledge of CI/CD concepts and tools
- Experience with monitoring and observability tools
- Exposure to IAM, security, and access management
- Strong understanding of systems, networking, storage, and databases
- Experience working in Agile / DevOps environments
Education
- Bachelor’s degree in Computer Science or related field, or equivalent practical experience
Skills (Keywords)
SRE, DevOps, Cloud Infrastructure, Terraform, CI/CD, Jenkins, Git, Automation, Monitoring, Incident Management, Kubernetes, AWS, GCP, Azure, Agile
Skills
site reliability engineering,terraform,cloud cli,cicd,