Diversified Services Network, Inc. (DSN) is seeking a full-time
Site Reliability Engineer (SRE) - (AWS / Platform Support) to join our team in
Chicago, IL OR Peoria, IL OR Irving, TX! We offer a
HYBRID schedule, full benefits, PTO, 401k, and more! If you're looking to grow your technical career within an extremely reputable, stable Fortune 500 company - let's talk!
JOB RESPONSIBILITIES:
- Own incident tickets through the full lifecycle, from initial triage to resolution and closure
- Collaborate with engineering, platform, product, and operations teams to diagnose issues and coordinate fixes
- Communicate incident status, impact, and resolution progress to stakeholders
- Lead or contribute to root cause analysis and ensure follow up actions are identified and tracked
- Ensure platform reliability through monitoring, alerting, security, and operational best practices
- Respond to and manage production incidents impacting AWS services and APIs
- Drive reliability, stability, and operational readiness improvements across cloud platforms
- Understand end‑to‑end technical and business flows to support production services effectively
- Develop, maintain, and improve clear, actionable runbooks for operational support
- Lead knowledge transfer sessions to ensure support teams are ready for production support
Requirements
EDUCATION & EXPERIENCE REQUIRED:
- Degree not required, but nice to have AND 2-4 years' experience
REQUIRED SKILLS:
- Experience supporting production grade, customer facing platforms in complex, multi‑team environments
- A demonstrated ownership mindset, taking accountability for service stability, incident outcomes, and follow through beyond initial investigation
- Strong understanding of AWS Kinesis streaming and messaging services, containerized and serverless compute using Fargate and Lambda, and CI/CD pipeline implementation using Azure DevOps
- Experience utilizing ServiceNow for incident management and Azure Devops for features, user stories, etc.
- Proven ability to partner effectively with engineering, product, and platform teams to resolve issues and improve operational efficiency
- Experience driving root cause analysis and continuous improvement, turning incidents into long term reliability gains
- Strong understanding of operational readiness standards, including monitoring, alerting and runbooks
- Comfort operating in on-call or escalation roles, maintaining composure and clear communication during high impact incidents
- Ability to identify gaps in processes or tooling and proactively improve support models, documentation, or workflows
- Experience working within enterprise ITSM frameworks
SOFT SKILLS REQUIRED:
- Strong communication skills, with the ability to translate technical issues into clear status and impact updates for stakeholders
Benefits
- 401(k)
- Dental insurance
- Vision Insurance
- Disability insurance
- Employee assistance program
- Health insurance
- Health savings account
- Life insurance
- Paid time off
- Paid Holidays
Please follow the link to our website for a list of job openings in Engineering, IT, Project Management, and more! https://www.dsnworldwide.com