Job Description
Company Descriptionneurogent.ai builds agentic AI solutions that help organizations automate workflows and improve customer experience across Banking, Healthcare, Insurance, and Financial Services. We develop intelligent AI agents, chatbots, and voice assistants that provide 24/7 support, streamline operations, and drive measurable efficiency gains. Using modern architectures such as RAG and LangGraph, we help teams reduce costs, improve productivity, and accelerate growth.
About the RoleWe're hiring a Sr. Software Engineer – DevOps, SRE & Observability to join a client-facing DevOps team actively building a next-generation CI/CD ecosystem. This isn't a maintenance role - you will be a key implementer of a modern, feature-flag-first DevOps platform using Harness for CI/CD orchestration, Dynatrace for observability, and AI tools including Claude to automate the creation of dashboards, pipelines, and operational runbooks.
The client has a clear, ambitious roadmap: decoupling deployments from releases using feature flags, spinning up ephemeral Kubernetes environments on every commit, shifting security left with Snyk and ChainGuard, and replacing slow manual operations with AI-generated automation. You will be a hands-on engineer and a creative problem-solver who can translate this roadmap into working systems.
This role demands strong engineering aptitude, deep CI/CD experience, solid SRE instincts, and genuine curiosity about using AI at its best. If you thrive at the intersection of engineering, reliability, and intelligent automation - this is your role.
Key ResponsibilitiesNext-Gen CI/CD Implementation
- Implement and mature the client's Harness-based CI/CD platform, including CI pipelines, CD orchestration, Artifactory integration, Feature Flags module, and Security module integrations with Snyk and Wiz.
- Build and maintain trunk-based branching workflows and enforce quality gates for code quality, security (SAST), and OSS licensing using GitHub/CodeQL and Snyk.
- Configure and manage ChainGuard hardened container base images to ensure CVE-free, secure containerization from the ground up.
Feature Flag-First Development
- Implement and operationalize a Feature Flag-first SDLC using Harness Feature Flags and/or Split.io, enabling decoupled deployment and release strategies.
- Build workflows that support progressive rollouts (1% → 10% → 50% → 100%), A/B testing, instant kill-switch capabilities, and RBAC-controlled lifecycle promotion across Dev, QA, Stress, and Production environments.
- Automate feature flag toggling via Jira workflow integrations tied to Harness pipeline lifecycle events.
Ephemeral Environments
- Design and manage ephemeral Kubernetes environment workflows for development and QA, spinning up isolated namespaces per feature branch using Harness and GitHub triggers.
- Integrate ephemeral environments with CI pipelines so stakeholders can review changes on live cloud deployments before merging to trunk.
AI-Powered DevOps Automation
- Use AI tools - especially Claude Code - to automate the generation of Dynatrace dashboards, alert policies, log queries, and SLO configurations from structured templates or natural language specifications.
- Build AI-assisted automation for operational toil: auto-generating runbooks, summarizing incident timelines, creating pipeline configurations, and writing deployment scripts.
- Continuously identify manual DevOps workflows that can be accelerated or eliminated through intelligent automation.
Observability & SRE
- Build and maintain full observability across applications and infrastructure using Dynatrace, including APM monitoring, CPU/memory dashboards, log aggregation, distributed tracing, and alert configuration.
- Define and operate against SLOs, SLIs, and error budgets; drive reliability improvements based on data.
- Lead incident response, own post-incident reviews, and ensure root causes result in permanent fixes - not recurring incidents.
- Support production releases, including troubleshooting failed deployments, managing rollbacks, and coordinating feature flag kill switches when needed.
Engineering & Collaboration
- Write, debug, and deploy code to resolve production defects across Python, Java, or C# services.
- Analyze application and system issues by reviewing logs, metrics, traces, and database queries.
- Collaborate with application engineering teams to implement long-term architectural improvements that improve reliability and deployment safety.
- Track and manage issues via ticketing systems (Jira), ensuring timely resolution and clear documentation.
Required Skills & Experience- 5+ years of experience in Software Engineering, DevOps, or SRE roles.
- Strong programming skills in Python, Java, or C# - you must be able to write, read, and debug real application code.
- Hands-on Harness experience: CI pipelines, CD deployments to Kubernetes, or Feature Flags module. Candidates with deep Jenkins/GitHub Actions experience and strong learning aptitude will also be considered.
- Feature flag experience: hands-on implementation with Harness Feature Flags, LaunchDarkly, Split.io, or equivalent - including progressive rollouts and environment-scoped flag management.
- CI/CD fluency: strong understanding of pipeline design, quality gates, artifact management, and deployment strategies (blue/green, canary, progressive).
- Kubernetes & Docker: hands-on experience managing containerized workloads, namespaces, and ephemeral environments.
- Dynatrace proficiency: building dashboards, writing metric/log queries, configuring APM monitors, and setting up alerting policies. Experience automating Dynatrace configuration via API or AI tooling is a strong plus.
- Security-left tooling: familiarity with Snyk (SAST, SCA/licensing), or equivalent SAST/SCA tools integrated into CI pipelines.
- Observability fundamentals: SLOs, SLIs, error budgets, distributed tracing, log aggregation, and on-call participation.
- Strong engineering aptitude: ability to quickly learn new platforms, evaluate tradeoffs, and build reliable systems in unfamiliar territory.
- AI tool fluency: active use of Claude Code, Cursor, or similar AI coding assistants to write, review, and automate code - not occasional use, but part of your daily workflow.
- Excellent communication skills with the ability to collaborate across engineering, product, and operations teams in a remote, cross-functional environment.
Nice to Have- Direct experience implementing Harness (CI, CD, Feature Flags, or Ephemeral Environments module).
- Experience with Release.io or similar ephemeral environment orchestration platforms.
- Familiarity with Snyk and Wiz for security policy enforcement in CI/CD.
- Experience with ChainGuard or other hardened base image strategies.
- Experience with Infrastructure as Code tools such as Terraform.
- Cloud platform experience: AWS, Azure, or GCP.
- Experience with GitHub CodeQL or similar static analysis tools.
- Prior work automating observability tooling (Dynatrace, Datadog, Splunk) via API or AI.
Work Details
Location
Hybrid (Gurgaon)
Work Hours
5:30 PM – 2:30 AM IST (aligned with US client hours)
Employment Type
Full-Time