Key Result Areas
◆ Production Support Ownership
- Designs and delivers end‑to‑end AI/ML/GenAI solutions, including model development, deployment, and production‑grade implementation.
- Builds scalable backend services and APIs using Flask/FastAPI with cloud‑native deployment on Azure (DevOps, AKS, Cognitive Services).
- Leads PoCs and innovation initiatives, experimenting with GenAI, Agentic AI, NLP, CV, OCR, and emerging open‑source frameworks.
Manages AI project lifecycle and teams, overseeing development, resolving model/system issues, and mentoring junior engineers.
- Take end‑to‑end ownership of the AI based system application’s production support, ensuring system stability, uptime, and reliability.
- Lead and monitor system health proactively, identify recurring issues, and drive permanent fixes in collaboration with technical teams.
- Ensure timely response and resolution of incidents, service requests, and operational issues within agreed SLAs.
◆ Issue Analysis, Troubleshooting & Root Cause Management
- Perform detailed analysis of incidents and problems, identify root causes, and implement preventive actions to avoid recurrence.
- Coordinate with development, infrastructure, and vendor teams to ensure effective resolution of complex technical issues.
- Maintain accurate documentation of incidents, RCA reports, and knowledge articles to support continuous improvement.
◆ Stakeholder Coordination & Communication
- Lead and collaborate closely with business users, product owners, and technology teams to understand issues, communicate updates, and manage expectations.
- Ensure clear, concise, and timely communication regarding incident status, risks, and impact.
- Provide leadership and build strong working relationships with internal teams and external vendors to support smooth issue resolution.
◆ Compliance, Risk & Audit Adherence
- Ensure all production support activities comply with regulatory controls, audit requirements, and internal risk guidelines.
- Lead and maintain operational discipline by following defined PMO, QMS, and architecture standards.
- Identify risks related to data security, application stability, and system availability, and ensure appropriate mitigation steps.
◆ Operational Excellence & Continuous Improvement
- Identify opportunities to streamline support processes, reduce manual effort, and improve turnaround times.
- Lead team to contribute to automation initiatives, monitoring enhancements, and process improvements.
- Track platform performance, incident trends, and recurring issues, and propose corrective actions to enhance system performance.
◆ Requirements Validation & Support for Enhancements
- Support requirement gathering for fixes, enhancements, and regulatory changes by providing production insights and validating feasibility.
- Provide consultation, leadership and assist project teams by verifying functional correctness, solution completeness, and alignment with business needs before deployment.
- Participate in UAT, release validation, and change readiness activities to ensure smooth production deployment.
◆ Technology Best Practices & Quality Assurance
- Adhere to Agile and modern engineering practices while supporting continuous delivery and deployment cycles.
- Maintain high-quality documentation, including SOPs, support guides, workflow diagrams, and configuration details.
- Ensure adherence to best practices in monitoring, alerting, logging, and operational reliability.
◆ Vendor & External Partner Collaboration
- Work with vendor teams to ensure timely delivery of fixes, updates, and support escalations.
- Review vendor performance and ensure value delivery aligned with SLA and platform requirements.
Operating Environment, Framework and Boundaries, Working Relationships
◆ Operating Environment
- Provide leadership for L1/L2 production support for the EDMS application, ensuring high availability, system stability, and timely issue resolution.
- Work in a complex technical environment involving Python, AI Agent technologies like OpenAI, Olama, GeminAI along with Java and Unix/Linux
- Designs and delivers end‑to‑end AI/ML/GenAI solutions, including model development, deployment, and production‑grade implementation.
- Builds scalable backend services and APIs using Flask/FastAPI with cloud‑native deployment on Azure (DevOps, AKS, Cognitive Services).
- Perform root cause analysis, incident recovery, and ongoing monitoring, including support during off‑business hours when required.
- Manage and oversee AI based solutions in production including process start/stop, execution, monitoring, and exception handling.
◆ Technical Framework & Skill Requirements
Minimum 4 to 7 years of hands-on experience in production support roles, preferably as a support lead or senior engineer.Strong working knowledge of:
- Python Scripting
- Flask/FastAPI with cloud‑native deployment on Azure (DevOps, AKS, Cognitive Services).
- GenAI, Agentic AI, NLP, CV, OCR, and emerging open‑source frameworks.
- Proficiency in Microsoft Excel, PowerPoint, and related tools for reporting, presentations, and status tracking.
◆ Reporting, Governance & Documentation
- Prepare and deliver production support reports, project updates, and performance dashboards in a clear and timely manner.
- Provide periodic and ad‑hoc reports as requested by management or stakeholders.
- Maintain high accuracy and clarity in documentation, status reporting, and communication.
◆ Working Relationships & Cross-Functional Collaboration
- Liaise with business users for issue clarification, data requirements, and production process coordination.
- Collaborate with technology teams such as Corporate Tech, Retail Tech, and Operations Tech for troubleshooting, information exchange, and workflow alignment.
- Coordinate with vendor teams for incident resolution, escalations, and enhancement support.
- Manage communication channels — phone calls, emails, and incident tickets — ensuring timely alerting to relevant process teams.
◆ Behavioral Expectations
- Display excellent attitude, responsiveness, and professionalism in dealing with users and internal teams.
- Demonstrate strong English communication skills, both written and verbal, to ensure clarity and transparency in all interactions.
- Work independently with a high sense of responsibility, ownership, and accountability.
Problem Solving
- Demonstrates strong analytical and diagnostic skills to troubleshoot complex production issues across AI technology
- Performs detailed root cause analysis (RCA) to identify failure points, document findings, and implement long‑term preventive measures to avoid recurrence.
- Manages crisis situations effectively by coordinating with relevant teams, restoring services quickly, and driving timely recovery during high‑severity incidents.
- Communicates clearly and proactively with users, business stakeholders, and technology teams during major incidents, ensuring transparency on status, impact, and recovery steps.
- Applies structured problem‑solving techniques such as impact analysis, log investigation, trend analysis, and incident pattern recognition to reduce downtime and improve stability.
- Prioritizes production issues based on severity, business impact, and downstream dependencies to ensure accurate and timely resolution.
- Collaborates with cross-functional teams to investigate recurring issues, propose corrective actions, and drive continuous improvement initiatives.
- Proactively monitors system alerts, logs, and performance indicators to detect potential risks early and prevent system outages.
- Ensures thorough documentation of incidents, RCA outcomes, and corrective actions to build knowledge repositories and reduce future turnaround times.
Decision Making Authority & Responsibility
- Recommend functional and technical solutions that are aligned with business requirements, system constraints, and production support best practices.
- Exercise sound judgment to balance delivery timelines, operational risks, and solution quality while supporting production stability.
- Make decisions related to production changes, ensuring zero compromise on quality and strict adherence to release and change management policies.
- Escalate risks, bottlenecks, or potential service impacts in a timely manner to ensure proper visibility and mitigation.
- Assess multiple solution options and use effective judgment to determine the most feasible approach within required timeframes and operational boundaries.
- Identify and highlight any concerns that may affect deliverables, system stability, or compliance, ensuring early intervention.
- Ensure 100% compliance with all change, release, and deployment governance guidelines when recommending or approving fixes and enhancements.
- Take responsibility for validating functional solutions and ensuring their readiness for production deployment.
- Make informed decisions during incident handling, prioritizing recovery actions based on business impact and urgency.
Knowledge, Skills And Experience
- 4 to 7 years of hands-on experience in Application Production Support for applications developed in AI related technologies.
- Exposure to BAU (Business-As-Usual) support activities with strong focus on system stability, issue resolution, and stakeholder coordination.
- Experience working in shift-based or on-call support models to ensure sla adherence.
- Good understanding of Python and AI based solution and workflows.
- Knowledge of identifying and raising alerts on application risks, failures, and performance anomalies.
- Ability to coordinate and support patching cycles, vulnerability remediation, and environment maintenance.
- Capable of performing volume-metric analysis to assess application performance, capacity utilization, and system scalability.
- Familiarity with incident, problem, and change management processes within production environments.
- Reasonable proficiency in English communication, both verbal and written, to coordinate effectively with business users and technical teams.
- Ability to engage with business users during BAU activities, clarifying issues, gathering inputs, and supporting daily operations.
- Basic team coordination skills to guide junior members or colleagues during support cycles.
- Strong sense of responsibility, ownership, and responsiveness in handling production support duties.
- Ability to raise timely alerts and provide clear communication on risks, performance gaps, or potential disruptions.
- Responsible for Implementation of new solutions in different projects areas of IRIS / EDMS / BPM .
- Ability to estimate, propose, design, build & document solutions.
- Have experience in leading technology workshops & training programmes.
- Conduct discovery and design sessions for business users & customers.
- Analyze the existing system for the discovery of the core processes and rules involved so that the system could be converted to a workflow based framework applying the core principles of BPM.
- Customize various utilities and interfaces with BPM Lombardi where interface design based on inbuilt Lombardi capabilities and integration definitions.
- Model and developed Business Process definitions, with various activities, gateways, events and other processes.
- Standardize the policies for snapshots in the WebSphere BPM Process Center Console.
- Build various services based on the requirements. Identify the business needs and build rule service, Human Service, Ajax service, Integration Service..
- Engage and carry out change management activities such as process definitions, implementations, user acquaintance. Have provided solution guidance, advisory services for various environments & systems, carried out process improvements by avoiding interdependencies on different support groups.
- Experienced in developing application blueprints, roadmaps and data migration strategies.
- Manage Vendor & Execute projects for specialized areas.
- Engage MSG for day-day support activities and provide support in delivering their components.
- Delivered multiple projects in short span of time on time.
- Have conducted engagement that coordinates between business and technical/vendor team by helping both parties in translating the biz requirements to final solutions.
- 3 end-to-end implementation project experience, production support for over 2 years, functional lead consultant experience for 5 years.
- Extensive experience with development best practices (design patterns) with the previously stated technologies to create highly scalable IBM BPM applications
- Implementation of technical deliverables: working processes, services, user interfaces, integrations, database queries, web services, etc. as needed.
- Must have led a team of more than 4 developers on a BPM project.
- Must have Architected end to end BPM applications using IBM BPM Product.
- Must have developed and tested complex business process models using IBM BPM Software
- Experience with creating sophisticated coaches with Web 2.0 appearance and behavior.
- Must have experience in creating reusable services toolkit
- Tracking metrics that are meaningful to support dashboards and reporting / visualization.
- Must have demonstrable experience working with Developers and mentor their use of BPM suites.
- Develop best practices as the BPM field continues to develop and innovate.
- Analyzed the existing Lotus note System, Modeled, build multiple workflows into IBM BPM with added functionalities.
- Customization of BPM process portals , Personalized task management and Process Management screens
- Worked on Coach and coach views
- Modeled and developed Business Process definitions, with various activities, gateways, events and other processes.
- Integrated various Banking applications in BPM as a part of Functional requirements.
- Standardized the policies for snapshots in the WebSphere BPM Process Center Console.
- Build various services based on the requirements. Identify the business needs and build rule service, Human Service, Ajax service, Integration Service..
- Model timer and message events; create standard libraries for common timer events that could be accessed by multiple applications.
- Develop customized java classes and import the corresponding JAR files and implemented them in BPM.
- Install the snapshots to the Production server.
- Documented on Migration Documents, Production transfer documents, user manuals and DB Scripts for various applications.
- Work on UI part to customize the coach views using CSS and DOJO script.
- About 8- 10 years of experience with 4 End-to-End Implementation and various support Projects of Enterprise Software Applications.
- Deep knowledge on technology solutions, operational process activities
- Analytical and negotiation skills. Leadership qualities. Thorough banking applications / business knowledge. Thorough knowledge in various applications in Banking domain
- Thorough knowledge and experience in respective areas / systems.