Role - Offshore Delivery Lead (SRE with ITSM)
Role & JD
Detailed JD
We are seeking an experienced Off-Shore Technical Team SRE Lead to join our global team of Site Reliability Engineers. As an SRE Lead, you will be responsible for managing and mentoring a team of offshore SREs and DevOps engineers who work on ensuring the reliability, performance, and availability of our cloud-based services. You will collaborate with other SRE Leads, Engineering Managers, and Product Owners to define and implement best practices, standards, and processes for SRE across the organization.
Key Responsibilities:
- Lead and mentor a team of SREs and DevOps engineers in an offshore setting.
- Ensure the reliability, performance, and availability of cloud-based services.
- Collaborate with engineering and product teams to define SRE best practices and standards.
- Implement and manage infrastructure as code (IaC) using tools like Terraform and Ansible.
- Monitor and alert using tools such as AppDynamics, Grafana, or Splunk.
- Manage CI/CD pipelines with Azure DevOps, GitHub, and Git.
- Utilize container technologies like Docker, Kubernetes.
- Apply chaos engineering principles using tools such as Gremlin or Chaos Monkey.
- Conduct load testing with tools such as JMeter, BlazeMeter, or K6.
- Adopt agile methodologies like Scrum, Kanban, or XP.
- Participate in on-call rotations, ensuring effective incident management and resolution.
- Provide feedback and guidance to improve team skills and knowledge in handling on-call incidents.
ITIL and ITSM Responsibilities:
- Implement ITIL best practices for incident, problem, and change management.
- Define, monitor, and ensure adherence to Service Level Agreements (SLAs) and Operational Level Agreements (OLAs).
- Ensure robust IT Service Management (ITSM) practices are in place for incident response, request fulfillment, and service continuity.
- Develop and maintain runbooks for effective incident management and service restoration.
- Collaborate with ITSM teams to improve service quality and reduce the number of incidents.
- Track and report on key ITIL metrics, such as mean time to resolution (MTTR) and incident resolution rates.
- Coordinate with the ITSM team to ensure proper documentation and communication of changes.
Preferred Skillsets:
- At least 5 years of experience as an SRE or DevOps Engineer, preferably in a cloud environment (GCP, Azure, etc.).
- At least 2 years of experience as a team lead or manager, preferably in an offshore setting.
- Proficient in scripting languages such as Python, Bash, or PowerShell.
- Experience with configuration management tools like Terraform, Ansible.
- Experience with monitoring and alerting tools such as AppDynamics, Grafana, or Splunk.
- Experience with CI/CD tools such as Azure DevOps, GitHub, Git.
- Experience with container technologies such as Docker, Kubernetes, or ECS.
- Familiarity with chaos engineering tools such as Gremlin, Chaos Monkey, or Litmus.
- Experience with load testing tools such as JMeter, BlazeMeter, K6.
- Experience with agile methodologies such as Scrum, Kanban, or XP.
- Strong understanding of ITIL and ITSM concepts, particularly around incident, problem, and change management.
- Ability to define and monitor SLAs and OLAs effectively.
On-call Expectation:
- As an SRE Lead, you will be expected to participate in the on-call rotation for your team and region, as well as escalate and coordinate issues with other teams and regions as needed.
- You will also be expected to provide feedback and guidance to your team members on how to handle on-call incidents and improve their skills and knowledge.