Position: DevOps Manager
Location: India (Remote)
Employment Type: Full-Time
Schedule: Monday to Friday, Day Shift
Company Description
Scry AI is a research-led enterprise AI company that builds intelligent platforms to drive efficiency, insight, and compliance. Our platforms Collatio®, Auriga®, and Concentio® streamline complex workflows by automating data extraction, validation, reconciliation and delivering real-time intelligence.
We are seeking a DevOps Manager to lead our infrastructure, CI/CD, and reliability practices across cloud and on-prem deployments. You will own uptime, performance, security, and cost efficiency for AI/ML workloads powering Collatio®, Auriga®, and Concentio®.
Role Overview
As DevOps Manager, you will lead a small team of DevOps/SRE engineers to design, automate, and operate secure, compliant, and highly available platforms across AWS/Azure/GCP and customer on-prem environments. You will standardize IaC, improve CI/CD velocity, build robust observability, and enable GPU-accelerated AI inference at scale for enterprise clients.
Key Responsibilities
Platform Reliability & Operations
• Own SLOs/SLIs, availability, latency, and capacity planning across services.
• Lead incident response, root-cause analysis, postmortems, and on-call processes.
• Implement backup, disaster recovery, and business continuity for multi-region and on-prem.
Cloud, On-Prem & Edge Deployments
• Architect Kubernetes platforms (managed and self-hosted), including RBAC, network policies, and secrets management.
• Standardize infrastructure with Terraform, Helm, and GitOps (Argo CD) for repeatable customer deployments.
• Support Concentio® edge/IoT rollouts with secure remote updates and telemetry pipelines.
AI/ML & Data Infrastructure
• Enable GPU scheduling and drivers (CUDA, NVIDIA), inference runtimes (Triton), and model packaging.
• Build MLOps foundations (MLflow, feature stores) and artifact/version governance.
• Operate data services (Kafka, PostgreSQL, Redis, MinIO/S3, Elasticsearch/Opensearch) for high-throughput pipelines.
CI/CD & Developer Experience
• Own CI/CD with GitHub Actions/GitLab CI/Jenkins; establish trunk-based development, automated testing, and canary/blue-green releases.
• Maintain internal developer platforms, templates, and golden paths to improve delivery speed and quality.
Security, Compliance & Observability
• Implement least-privilege access, SSO (Okta/AAD), Vault-based secrets, image scanning (Trivy), and policy as code.
• Ensure SOC 2, ISO 27001, HIPAA/GDPR alignment with audit trails and immutable logs.
• Build end-to-end observability using Prometheus, Grafana, Loki/EFK, and OpenTelemetry.
FinOps & Stakeholder Management
• Track cloud spend, rightsize resources, and negotiate quotas for GPU/compute.
• Partner with Product, Data Science, and Customer Success to plan capacity for new features and enterprise go-lives.
Required Qualifications & Skills
• Strong Kubernetes expertise (production operations, networking, security, Helm, GitOps).
• Proven IaC experience with Terraform and configuration management (Ansible).
• CI/CD at scale with GitHub Actions/GitLab CI/Jenkins; artifact registries and SBOMs.
• Observability: Prometheus, Grafana, ELK/EFK or Loki, alerting and runbooks.
• Cloud proficiency in at least one major provider (AWS/Azure/GCP) and Linux fundamentals.
• Security fundamentals: network segmentation, TLS, secrets management, container hardening.
• Experience running data/streaming systems (Kafka, Redis, PostgreSQL) in production.
• Excellent communication, incident leadership, and stakeholder management.
Nice-to-Have
• GPU orchestration, Triton Inference Server, Hugging Face model serving.
• Service mesh (Istio/Linkerd), API gateways, and zero-trust patterns.
• MLOps tooling (MLflow, Feast), Airflow, dbt.
• Compliance implementations for regulated industries (BFSI, healthcare).
• Certifications: CKA/CKAD, AWS/Azure/GCP Architect, Security+.
Our Ideal Candidate
• Drives reliability with automation, not toil.
• Balances speed and safety with measurable delivery improvements.
• Thrives in customer-facing, hybrid cloud, and on-prem environments.
• Coaches teams with clear standards, runbooks, and continuous improvement.
Tip for Candidates
If you want to build secure, high-performance platforms for real-world AI at enterprise scale, follow our page for more such relevant job openings.