Job Title: SRE - DevOps Engineer
Location: Chennai, TN
Interview Mode: F2F
Experience Levels
- B2: 5-8 Years
- B3: 9-11 Years
- C1: 11+ Years
Job Description
We are seeking a
SRE DevOps Engineer to design, implement, and manage monitoring solutions and observability tools for our infrastructure. The ideal candidate will be responsible for setting up monitoring tools, creating observability dashboards, and troubleshooting complex infrastructure and application issues.
Technical Skills - Monitoring Tools:
- Ability to set up and configure monitoring tools.
- Capable of designing and implementing complex alerting rules, optimizing tool performance, and implementing federation for scalability.
- Ability to integrate monitoring tools with visualization tools.
- Experience in troubleshooting monitoring tools.
- Required Tools: Proficiency in Any 2: Prometheus, Azure Monitor, CloudWatch, DataDog, Dynatrace, ELK, Harness SRM.
- Observability Tools:
- Strong understanding of observability metrics and KPIs.
- Must be capable of creating complex observability dashboards by integrating with various data sources.
- Ability to troubleshoot observability tools.
- Required Tools: Proficiency in Any 1: Grafana, AppDynamics, Splunk, OpenTelemetry.
- Application Troubleshooting:
- Hands-on experience troubleshooting issues with applications.
- Required Tools: Proficiency in Any 1: Java, .NET, Python.
- Automation & Scripting:
- Ability to write and debug automation scripts.
- Integrate third-party libraries into scripts.
- Required Tools: Proficiency in Any 2: Ansible, Python, Shell, Linux, Unix, Groovy, Java, Powershell, Golang.
- Kubernetes:
- Ability to set up and manage a Kubernetes cluster.
- Deploy and troubleshoot applications on a Kubernetes cluster.
- Required Tools: Kubernetes/Helm Charts.
- Infrastructure as Code (IaC):
- Ability to write and debug complex infrastructure configurations.
- Integrate with CI/CD pipelines.
- Required Tools: Proficiency in Any 1: Terraform, Puppet, GitHub, OpenShift, OpenTofu.
Non-Technical Skills - Agile Methodologies:
- Working experience in Agile methodologies (e.g., Scrum, Kanban, SAFe).
- ITSM Knowledge:
- Strong knowledge of ITSM (IT Service Management) principles.
- Troubleshooting & RCA:
- Ability to troubleshoot issues rapidly and perform Root Cause Analysis (RCA).
- Shift Flexibility:
- Willingness to work in shifts, as required.
- Teamwork & Communication:
- Team player with a positive attitude.
- Good communication skills.
- Tools:
- Proficiency in Any 1: JIRA, ServiceNow.