We are seeking an experienced Ops Lead (Support Team Lead) to lead our production support operations for a SaaS product platform. The role involves managing L1/L2 support teams, owning incident management and SLAs, and working closely with engineering, DevOps, and product teams to ensure high availability, stability, and customer satisfaction.
Key Responsibilities
Team & Operations Management
- Lead, mentor, and manage L1/L2 production support engineers
- Plan and manage shift rosters, on-call rotations, and workload distribution
- Ensure 24×7 production support coverage (as applicable)
- Conduct performance reviews, coaching, and skill development
- Drive knowledge sharing, runbooks, and SOP standardization
Incident & Problem Management
- Own major incident management (P1/P2) and act as the escalation point
- Lead incident bridges, war rooms, and stakeholder communications
- Ensure timely resolution and adherence to SLAs/SLOs
- Drive root cause analysis (RCA) and preventive action plans
- Track recurring issues and work with engineering on permanent fixes
Technical & Platform Ownership
- Oversee monitoring, alerting, and health checks for production systems
- Ensure effective use of logs, dashboards, and observability tools
- Support release readiness, change management, and rollback planning
- Collaborate with DevOps on availability, performance, and capacity planning
- Review and improve operational metrics and KPIs
Stakeholder & Process Management
- Act as the primary interface between Support, Engineering, Product, and DevOps
- Communicate incident status and post-incident reports to leadership
- Drive ITIL-aligned processes (Incident, Problem, Change, Knowledge)
- Continuously improve support processes and operational maturity
Mandatory Skills & Qualifications
- 8+ years of experience in production support/operations for product or SaaS platforms
- 2+ years of experience leading support or operations teams
- Strong understanding of SaaS architecture and production environments
- Hands-on experience with incident management and SLA ownership
- Strong knowledge of application monitoring, logging, and alerting
- Experience working with SQL and NoSQL databases (operational perspective)
- Understanding of distributed systems and microservices
- Experience supporting event-driven systems (Kafka or similar)
- Familiarity with Linux/Unix environments
- Strong communication and stakeholder management skills
Good to Have
- ITIL certification or strong ITIL process knowledge
- Exposure to cloud platforms (AWS / Azure / GCP)
- Basic scripting knowledge (Python, Shell)
- Experience working with graph databases
- Background in Java or Python-based applications
- Experience in DevOps or SRE-aligned environments
Key Competencies
- Strong leadership and decision-making skills
- Ability to perform under pressure during critical incidents
- Data-driven mindset for operational improvements
- Customer-first and reliability-focused approach
- Excellent documentation and process orientation