SRE Engineer / Senior SRE Engineer
Around 3-6 years of SRE hands on exposure with Alerting Techniques, Datadog Monitoring, Incident Management, Application Support and Automation
Roles And Responsibilities
- InDepth Understanding on SLO based Burn Rate Alerting, Anomaly detection, Datadog Watchdog, or any other Alerting Techniques.
- Hands-on experience with Datadog, including the ability to create custom alerts and dashboards.
- Proficient in utilizing Datadog for comprehensive monitoring and analysis.
- Extensive experience in working on Payment Monitoring for Multiple Services.
- Perform triage steps effectively to identify and address issues promptly.
- Experience in Alert Noise Reduction techniques for efficient and streamlined monitoring.
- Proven record of accomplishment in Incident Management, with the ability to handle incidents from identification to resolution.
- Collaborate with different Dev Teams for reporting incidents and follow up on the resolution.
- Conduct thorough incident analysis to identify root causes and implement preventive measures.
- Demonstrate the ability to perform automation using Java or Python. Able to raise PR to create Alert or Fine Tune Alerts.
- Automate manual processes to eliminate toil and enhance operational efficiency.
- Must be flexible and comfortable working on rotational shifts from the office (NO work from home)
Qualifications
- Bachelor's degree in computer science, Information Technology, or related field.
- Proven experience in a lead role with a focus on Datadog, Monitoring, and Incident Analysis.
- Strong programming skills, particularly in Java or Python, for automation purposes.
- Excellent communication skills and the ability to collaborate with cross-functional teams.
- Previous experience working in a high-pressure environment with a commitment to meeting deadlines.
- Relevant certifications in Datadog, Grafana & Prometheus or Incident Management will be a plus.
(ref:hirist.tech)