Job Description
Project Role : DevOps Engineer
Project Role Description : Responsible for building and setting up new development tools and infrastructure utilizing knowledge in continuous integration, delivery, and deployment (CI/CD), Cloud technologies, Container Orchestration and Security. Build and test end-to-end CI/CD pipelines, ensuring that systems are safe against security threats.
Must have skills : Site Reliability Engineering
Good to have skills : NA
Minimum 3 year(s) of experience is required
Educational Qualification : 15 years full time education
Roles & Responsibilities:
- Expected to perform independently and become an SME.
- Required active participation/contribution in team discussions.
- Contribute in providing solutions to work related problems.
- Monitor and optimize system uptime, latency, and throughput to meet SLOs and SLIs.
- Lead incident response, manage escalations, conduct root cause analysis (RCA), and drive postmortem reviews.
- Develop and maintain CI/CD pipelines while eliminating manual toil through automation and scripting.
- Implement monitoring, logging, and tracing frameworks to ensure real-time observability of distributed systems.
- Conduct capacity planning, resource forecasting, and infrastructure scaling to handle surge conditions.
- Partner with development teams to enable safe feature releases with automated testing and rollback mechanisms.
- Implement disaster recovery strategies, multi-region resilience, and chaos testing for business continuity.
- Drive continuous process improvement using post-incident analytics and data-driven insights.
- Collaborate with product, design, ML, and DevOps teams to build intelligent workflows and enhanced user experiences.
- Implement Infrastructure as Code (IaC) using tools such as Terraform, CloudFormation, Azure DevOps, or Pulumi.
- Ensure cloud infrastructure security, compliance, and performance optimization for highly available systems.
Professional & Technical Skills:
-Must To Have Skills: Proficiency in Site Reliability Engineering principles and practices.
- Good To Have Skills: Experience with cloud platforms such as AWS, Azure, or Google Cloud Platform (GCP).
- Strong hands-on expertise in Kubernetes and Infrastructure as Code tools like Terraform.
- Expertise in scripting and programming languages such as Python, Go, Bash, or JavaScript for automation and tooling.
- Deep understanding of Linux systems, networking, and distributed architectures.
- Experience with observability solutions such as Prometheus, Grafana, Datadog, CloudWatch, or New Relic.
- Familiarity with incident management and alerting platforms like PagerDuty or xMatters.
- Proficiency in CI/CD frameworks such as Jenkins, GitHub Actions, or GitLab CI.
- Working knowledge of security, compliance, and performance optimization in cloud-native environments.
Additional Information:
- The candidate should have minimum 3 years of experience in Site Reliability Engineering or related roles.
- This position is based at our Bengaluru office.
- A 15 years full time education is required.
- Relevant certifications such as AWS Certified Solutions Architect Professional, Microsoft Certified: Azure Solutions Architect Expert, Google Professional Cloud Architect, Certified Kubernetes Administrator (CKA), HashiCorp Certified: Terraform Associate, or DevOps Engineer certifications are preferred.
- The candidate needs to be AI Ready.