Company: Gnani.ai
Location: Bangalore (Office-based) + Occasional Travel to client locations
Experience: 10+ Years (with 3+ years in a direct people-management role)
Reports To: Head of Delivery / Head of Engineering
Team Size: 8–15 engineers across DevOps, SRE, On-Premise Deployment, and Integration Engineering
Job Title: DevOps Manager
About Gnani.ai
Gnani.ai is a deep-tech company building large-scale Speech AI, LLM, and Agentic AI-powered voice automation platforms for global enterprises. Our flagship platform Inya powers 30M+ voice interactions daily across 40+ languages for 200+ enterprise customers, including Fortune 500 brands across BFSI, telecom, healthcare, and retail.
Our products operate in real-time, at high concurrency, with a sub-800ms end-to-end latency SLA, and are deployed on Azure and on-premise environments (including air-gapped BFSI datacenters) to meet stringent regulatory requirements.
Role Overview
We are seeking a hands-on DevOps Manager to lead and scale a multi-disciplinary platform organization comprising DevOps, SRE, On-Premise Deployment, and Integration Engineering teams. This is a player-coach role: you will own production reliability, platform scalability, and operational excellence across our SaaS and on-premise deployments while building and growing a high-performing team.
You will be accountable for the availability, performance, and security of the platforms powering Gnani's enterprise AI products. A significant part of this role is cross-functional collaboration — you will partner daily with the Delivery and Solutions teams to translate customer project requirements into deployable infrastructure, and with the Information Security (InfoSec) team to operationalize Gnani's security posture across cloud and DevOps. You will also work closely with Product, AI/ML, and Customer Success teams to accelerate delivery, reduce incidents, and enable seamless enterprise deployments — including air-gapped installations for BFSI customers.
The Team You Will Lead
• DevOps Engineers — own CI/CD, infrastructure automation, Kubernetes platform operations, and developer enablement across Azure (primary) and AWS.
• Site Reliability Engineers (SRE) — own production reliability, SLO/error-budget management, incident response, performance engineering, and observability for real-time voice AI workloads.
• On-Premise Deployment Engineers — own turnkey deployments of Inya and Gnani's product stack in enterprise (especially BFSI) datacenters, including air-gapped environments, hardware sizing, GPU provisioning, and customer handover.
• Integration Engineers — own telephony integrations (SIP, WebRTC, contact center platforms, carrier/SBC connectivity) and secure enterprise network integrations (IPsec VPN tunnels) required to connect customer environments to the Inya platform.
Key Responsibilities
People & Team Leadership
• Build, mentor, and retain a high-performing team of 8–15 engineers across four sub-disciplines; run hiring, onboarding, performance reviews, and career development.
• Define team structure, squad ownership, on-call rotations, and RACI across DevOps/SRE/Deployment/Integration.
• Establish a blameless post-incident review culture, engineering rituals (sprint planning, architecture reviews, operational reviews), and clear career ladders.
• Represent the platform organization in leadership forums; drive cross-functional alignment with Delivery, Solutions, InfoSec, Product, AI/ML, and Customer Success.
Platform Engineering & Reliability
• Own the architecture and operation of Kubernetes-based platforms on Azure (primary) supporting real-time voice AI workloads
• Lead incident command for SEV1/SEV2 events; drive RCA, systemic fixes, and reduction of MTTR and incident recurrence.
• Own capacity planning and cost governance across cloud and on-prem estates.
CI/CD, GitOps & Developer Experience
• Evolve Jenkins + ArgoCD-based CI/CD and GitOps platforms into a self-service Internal Developer Platform (IDP) to reduce service onboarding time and deployment friction.
• Establish deployment standards — canary/blue-green, automated rollback, feature flags, progressive delivery for ML model versions and agent workflows.
• Embed DevSecOps controls — SAST/DAST, image scanning, SBOM, secrets management, policy-as-code — into the delivery pipeline.
On-Premise & Enterprise Deployments
• Own the end-to-end playbook for deploying Inya and Gnani's product stack into enterprise on-premise environments (RKE2, OpenShift, or equivalent), including air-gapped BFSI deployments.
• Standardize hardware sizing, GPU provisioning, private registry mirroring, and customer-specific security hardening (PCI-DSS, GDPR, RBI, SOC2 alignment).
• Partner with Customer Success and pre-sales to scope deployments, unblock enterprise rollouts, and reduce time-to-go-live.
Observability & Operations
• Own the observability stack (Prometheus, Grafana, Loki/OpenSearch, OpenTelemetry) — metrics, logs, distributed tracing across the streaming voice pipeline.
• Drive signal-over-noise alerting, runbook automation, and PagerDuty discipline; reduce on-call toil and protect team sustainability.
• Establish production readiness reviews (PRR) as a gating mechanism for new services and models entering production.
Integration & Enterprise Connectivity
• Oversee the integration engineering function responsible for telephony integrations (SIP trunks, WebRTC, SBCs, major contact center platforms) and IPsec VPN tunnels with enterprise customers. Standardize certification, load testing, and handover processes to reduce per-customer integration lead time.
Cross-Functional Collaboration with Delivery, Solutions & InfoSec
• Partner with Delivery & Solutions teams as the platform owner for every customer engagement — participate in solution design, translate customer technical and infrastructure requirements into deployable architectures, commit to realistic infra timelines, and own the DevOps workstream from kickoff through go-live. Drive pre-deployment readiness reviews, environment provisioning, customer-specific configuration, and UAT/production cutovers. Remove platform-side blockers that delay customer projects and serve as the escalation point for infra risks surfaced by Delivery.
• Partner with the Information Security (InfoSec) team to design and operationalize Gnani's security posture across cloud and DevOps. Translate InfoSec policies into enforceable platform controls — network segmentation (NSG, VNet peering, private endpoints), identity and access (Entra ID, RBAC, JIT access), secrets management (KeyVault), vulnerability management (image scanning, patching cadence), hardening baselines (CIS benchmarks for AKS and Linux), encryption in transit and at rest, and audit logging. Own timely remediation of findings from VAPT, red-team exercises, and customer security audits. Support InfoSec in customer security reviews, RFPs, and compliance certifications (SOC2, ISO 27001, PCI-DSS, GDPR).
Security, Compliance & Cost
• Partner with Security to enforce zero-trust networking, mTLS, secrets management (Vault/KeyVault/Secrets Manager), RBAC, and policy enforcement (OPA/Gatekeeper).
• Own compliance posture for the platform across PCI-DSS, GDPR, SOC2, ISO 27001 as applicable.
• Drive cloud and GPU cost optimization; deliver measurable reductions through rightsizing, autoscaling, spot utilization, and efficient model serving (continuous batching, MIG, KV-cache tuning).
Must-Have Qualifications
• 10+ years of total experience, including 3+ years directly managing DevOps, SRE, or Platform Engineering teams (with formal people-management responsibility — hiring, performance, career growth).
• Deep, hands-on expertise in Kubernetes (EKS, AKS, RKE2 or equivalent) and Docker at production scale; able to debug cluster-level issues, not just operate them.
• Strong hands-on expertise in Azure (AKS, VNet/NSG, Application Gateway, Azure Monitor, KeyVault, Entra ID) — this is our primary cloud. Working knowledge of AWS is a plus.
• Proven experience architecting and operating CI/CD and GitOps platforms using Jenkins, ArgoCD, Helm, and Kustomize.
• Demonstrated ownership of production SLOs, incident response, and on-call programs for customer-facing SaaS platforms.
• Hands-on experience with Terraform (or equivalent IaC), Bash/Python scripting, and infrastructure automation.
• Experience leading observability programs using Prometheus, Grafana, and a log/trace stack (Loki/ELK/OpenSearch + OpenTelemetry).
• Experience delivering to enterprise customers with formal security, compliance, and deployment requirements (BFSI, telecom, or healthcare preferred).
• Proven track record of working shoulder-to-shoulder with Delivery, Solutions, and InfoSec counterparts — translating customer and security requirements into executable infra plans, owning platform commitments in customer projects, and driving timely remediation of security findings.
• Excellent written and verbal communication; able to operate with executives, customers, and engineers with equal effectiveness.
Highly Preferred Skills
• Hands-on experience with voice/telephony infrastructure — SIP, WebRTC, SBCs, carrier-grade signaling, and DTMF/media handling — and with IPsec VPN topologies for enterprise connectivity.
• Experience leading on-premise and air-gapped deployments for enterprise customers, especially BFSI.
• Experience with real-time, low-latency systems (voice, video, WebRTC, SIP) and understanding of streaming-system observability.
• PostgreSQL, MongoDB, Redis, RabbitMQ operations at scale.
• DevSecOps tooling — Trivy, SonarQube, TFSec, OPA/Gatekeeper, Vault, secrets management.
Nice to Have
• Exposure to LLMOps / MLOps — model versioning, canary rollouts for models, model-serving cost optimization.
• Relevant certifications — CKA / CKAD / CKS, AWS / Azure Solutions Architect Professional,