Devops SRE

Viraaj HR Solutions Private Limited • Full-time • Bengaluru, IN • ₹ 1,000,000 - ₹ 3,000,000 / year • 3d ago

Industry & Sector: Operating in the Cloud Infrastructure and Enterprise SaaS sector, this high-availability engineering team builds and runs resilient, containerised production platforms that support mission-critical customer applications. We deliver scalable, observable, and secure cloud-native services for global users.

Role: Site Reliability Engineer (SRE) — On-site (India)

Role & Responsibilities

Design, deploy, and maintain production-grade Kubernetes-based platforms to ensure high availability, scalability, and security.
Author and maintain Infrastructure-as-Code to provision and manage cloud resources, enabling repeatable, auditable deployments.
Build and operate CI/CD pipelines and automated release processes to accelerate safe delivery of features and fixes.
Implement observability: metrics, logging, tracing, and alerting; define SLOs/SLIs and automate incident detection and response.
Lead incident management and post-incident reviews to drive reliability improvements and reduce MTTR.
Collaborate with development and product teams to optimize performance, reduce costs, and harden the platform for production traffic.

Skills & Qualifications

Must-Have

Kubernetes
Docker
Terraform
AWS
Linux
Prometheus
Grafana
Jenkins

Preferred

Helm
Ansible
Python

Qualifications

Proven experience operating production cloud infrastructure and container platforms (demonstrable projects or on-call history preferred).
Strong troubleshooting skills across distributed systems, networking, and storage.
Willingness to work on-site in India and participate in on-call rotation.

Benefits & Culture Highlights

Hands-on exposure to large-scale cloud-native systems and opportunity to drive reliability best practices.
Collaborative engineering culture with focus on learning, ownership, and measurable impact.
Competitive compensation and benefits aligned to on-site roles in India.

We are looking for proactive SREs who enjoy end-to-end ownership of platform reliability, automation-first engineering, and close collaboration with developers to deliver reliable services at scale. Apply if you thrive on solving complex operational challenges and driving continuous improvement.

Skills: aws,prometheus,kubernetes,sre,jenkins,grafana,terraform,linux,docker