Position : Tech Lead - SRE+DevOps
Experience : 7-10 Years
Location - Pune (Hybrid)
Shift - Regular afternoon shift - 3:00 PM to 12:00 AM
WWe are seeking a highly skilled SRE+ DevOps to join our team. This role involves ensuring the reliability, scalability, and efficiency of cloud infrastructure and applications while implementing SRE best practices for deployment, monitoring, and automation. As a senior member, you will lead efforts in system reliability, mentor junior engineers, and drive improvements in infrastructure automation.
Required Skills & Qualifications:
- Have 8+ years of experience in Site Reliability Engineering (SRE), Azurre DevOps, or Azure cloud infrastructure roles.
- Hands-on experience with cloud platforms (Azure).
- Strong experience with CI/CD tools (GitHub Actions, Jenkins, or Azure Pipelines).
- Proficiency in Python, Bash, or PowerShell for automation.
- Extensive experience with Infrastructure as Code (Terraform).
- Expertise in monitoring tools such as Datadog.
- Strong understanding of networking, security, and containerization (Docker, Kubernetes).
- Proven track record in leading and mentoring teams.
- Drive DevOps practices across build and deployment pipelines
- Partner with Development, QA, and Architecture teams to enable fast and safe releases
Key Responsibilities:
- Design, build, and maintain scalable and reliable cloud infrastructure.
- Ensure System Reliability: Maintain uptime, scalability, and performance across production environments.
- Monitor & Alerting Setup: Configure real-time monitoring and observability dashboards.
- Automate Everything: Reduce toil by scripting repetitive tasks, CI/CD, and self-healing mechanisms.
- Incident Response & RCA: Own on-call rotations, resolve P1/P2 incidents, and create blameless postmortems.
- Optimize Costs & Performance: Work on cloud cost optimization (FinOps), database tuning, and caching strategies.
- Security & Compliance: Implement least privilege access, encryption, and vulnerability assessments.
- Infrastructure as Code (IaC): Deploy and manage infra with Terraform, Ansible, Helm.
- Capacity Planning & Scaling: Ensure load balancing, horizontal scaling, and traffic routing.
- Process Documentation: Maintain detailed SOPs, incident response guides, and architecture diagrams.
- Lead the implementation of CI/CD pipelines for application deployments.
- Manage and optimize Kubernetes clusters and containerized workloads.
- Collaborate with development and operations teams to ensure smooth deployment of applications.
- Troubleshoot and resolve incidents, ensuring minimal downtime for production services.
- Mentor and provide guidance to junior engineers, fostering a culture of reliability and automation.
Additional Skills:
Core Experience:
- Have 7–10 years of experience in DevOps, SRE, or Platform Engineering.
- Strong experience with Linux systems and production troubleshooting.
- Hands-on expertise in at least one cloud platform: Azure Cloud.
Deep experience with:
- CI/CD: Jenkins, GitHub Actions, GitLab CI, Azure DevOps
- Containers & Orchestration: Docker, Kubernetes
- IaC: Terraform (preferred), ARM, CloudFormation
- Observability: Prometheus, Grafana, ELK, Datadog, New Relic
- Strong scripting skills in Python / Bash / Shell / Go
- Solid understanding of networking, security, and system design