Purpose:
The Operations Lead will act as a lead for Operations team working across DevOps best practices, Operations Management, Optimization and Development of production systems and deployment to production and non-production environments in cloud
• The Operations Lead will be focused on running better production systems by taking part in defining engineering approaches that deliver best practice
• Discussing and suggesting best practices and prescribing solutions for various reliability issues.
• Identify required resource needs specific to each engagement as well as define cadence and plan for all BAU activities
This job group contains roles responsible for providing dedicated first line support for our software applications. The roles are responsible for diagnosing and managing issues not related to code (escalating to software engineers to fix the code where required – ensuring the principle for software engineers of you built it, you run it, it breaks, you fix it is maintained) and providing first line customer support (e.g. banker has an issue with a screen in a software application, needs support).
Job Specific Skills
- Work with incident management to streamline & improve incident triage process and implement best practices to GCC.
- Process and Cost optimizations using automation in close tandem with Leads and Engineering Managers
- Monitoring SLAs and ensuring team adheres to them
- Prevent unplanned work creep for team by being first point of contact for any environment issues and escalations
- Take part in defining and coming up with SRPs for known scenarios
- Helping project go-lives by overseeing and implementing the process requirements and optimizations
- Owner of overall KT process for new joiners and existing resources for BAU items
- Taking part in DR testing and leading team to complete such activities successfully
- Be on top of Vulnerabilities and manage them within team effectively
- Be integral part of interviews for new recruits
- Ensure team adheres to SLAs for various work-types such as Incidents, Changes, Service Requests, Problem Records
- Bring up the level of team knowledge by taking part in documentation and leading team in doing the same
- Work with service delivery to resolve problem records swiftly.
- Challenge and uplift cultural thinking and propagate change in the SDH Domain and develop champions within Services.
- Setup cadence for Production roster team and manage the roster as well as work assignments and any escalations
- Setup release cadence for production code and go-lives and take part in socializing and reviewing the same periodically
- Own the alerting and monitoring setup including defining new as well as refining existing alerts and tools to help with issue triage and faster resolutions
- Guide Infrastructure capacity & planning management using system metrics to ensure scalability of services to develop dashboards.
- Implement strategies to progressively reduce functional and non-functional defects within the production environments through use of production monitoring tools
- Implement strategies to proactively prevent incidents in production through use of machine learning and AI technologies
- Responsible for owning and improving processes around infrastructure creation, environment maintenance and alerting & monitoring
- Taking part in cost optimizations, meeting compliance and meeting performance requirements
- Provide ongoing support for production releases and be part of the escalation path for resources for any technical issues
- Share responsibility with the Engineering Managers, Scrum Masters and alike to prioritise the story backlog within delivery sprints as well as planning for sprints
- Organize and take part in Game days and KT sessions and also Chaos Engineering practices to make platform more resilient and robust
- Knowledge of algorithms and data structures.
Continuous improvement (Passion for customers)
- Improve the quality of our system by developing standard operating procedures. Enhance applications by identifying opportunities for improvement, making recommendations and designing and implementing systems.
- Continually seek out relevant industry and technical knowledge and improve on professional skills by completing necessary development activities and actively share knowledge acquired with appropriate audiences.
- Investigating and using new technologies where relevant.
Win together
- Liaise with colleagues to go through technical designs and providing inputs where necessary
- Provide written knowledge transfer material.
- Coach team members to ensure they have the skills and knowledge to perform their roles in a professional and ethical manner.
Respect for people
- Responsibilities for health and safety has been assigned to all employees. For your specific responsibilities refer to Responsibilities for Workplace Health and Safety located on the intranet under People Toolbar.
- Provide professional and ethical behaviour in your actions by ensuring compliance with external legislation, bank standards and internal operating policies and procedures relevant to the position.
- Ensure that all work is performed in accordance with the requirements of the Health & Safety Policy, procedures and legislation. Take reasonable care for own health & safety, as well as that of others.
People Leadership Responsibilities
- Foster an inclusive, high-performing team culture aligned with NAB’s Enterprise Behaviors.
- Provide coaching, feedback, and career development support to team members.
- Drive engagement by recognizing achievements and promoting collaboration.
- Ensure clarity of roles, responsibilities, and performance expectations.
- Support workforce planning, onboarding, and capability uplift initiatives.
- Champion diversity, inclusion, and psychological safety within the team.
Responsible For:
- Shaping the direction of production support and incident response.
- Providing high-level consultancy on process improvements.
Main Activities:
- Lead strategic improvements to support processes.
- Oversee the development of automation and monitoring solutions.
- Guide the organization on best practices in production support.
Key Skills:
- Expert troubleshooting and automation.
- Leadership in support strategies and process development.