Technical Lead - SRE/Devops

Harbinger Group • Full-time • Pune, IN • 2d ago

Position : Tech Lead - SRE+DevOps

Experience : 7-10 Years

Location - Pune (Hybrid)

Shift - Regular afternoon shift - 3:00 PM to 12:00 AM

WWe are seeking a highly skilled SRE+ DevOps to join our team. This role involves ensuring the reliability, scalability, and efficiency of cloud infrastructure and applications while implementing SRE best practices for deployment, monitoring, and automation. As a senior member, you will lead efforts in system reliability, mentor junior engineers, and drive improvements in infrastructure automation.

Required Skills & Qualifications:

Have 8+ years of experience in Site Reliability Engineering (SRE), Azurre DevOps, or Azure cloud infrastructure roles.
Hands-on experience with cloud platforms (Azure).
Strong experience with CI/CD tools (GitHub Actions, Jenkins, or Azure Pipelines).
Proficiency in Python, Bash, or PowerShell for automation.
Extensive experience with Infrastructure as Code (Terraform).
Expertise in monitoring tools such as Datadog.
Strong understanding of networking, security, and containerization (Docker, Kubernetes).
Proven track record in leading and mentoring teams.
Drive DevOps practices across build and deployment pipelines
Partner with Development, QA, and Architecture teams to enable fast and safe releases

Key Responsibilities:

Design, build, and maintain scalable and reliable cloud infrastructure.
Ensure System Reliability: Maintain uptime, scalability, and performance across production environments.
Monitor & Alerting Setup: Configure real-time monitoring and observability dashboards.
Automate Everything: Reduce toil by scripting repetitive tasks, CI/CD, and self-healing mechanisms.
Incident Response & RCA: Own on-call rotations, resolve P1/P2 incidents, and create blameless postmortems.
Optimize Costs & Performance: Work on cloud cost optimization (FinOps), database tuning, and caching strategies.
Security & Compliance: Implement least privilege access, encryption, and vulnerability assessments.
Infrastructure as Code (IaC): Deploy and manage infra with Terraform, Ansible, Helm.
Capacity Planning & Scaling: Ensure load balancing, horizontal scaling, and traffic routing.
Process Documentation: Maintain detailed SOPs, incident response guides, and architecture diagrams.
Lead the implementation of CI/CD pipelines for application deployments.
Manage and optimize Kubernetes clusters and containerized workloads.
Collaborate with development and operations teams to ensure smooth deployment of applications.
Troubleshoot and resolve incidents, ensuring minimal downtime for production services.
Mentor and provide guidance to junior engineers, fostering a culture of reliability and automation.