Job Title: AI DevOps Engineer — Mid/Senior Level
Experience: 4–7 Years
Location: Raidurg Main Road, Hyderabad.
Work Mode: On-site
Work Hours: 2-11 PM
Notice Period: Immediate Joiner (15-30 days)
About us: At Stackular, we are more than just a team – we are a product development community driven by a shared vision. Our values shape who we are, what we do, and how we interact with our peers and our customers. We're not just seeking any regular engineer; we want individuals who identify with our core values and are passionate about software development.
About the Role
We are looking for a Mid-Level AI DevOps Engineer with 4-7 years of experience in DevOps, cloud infrastructure, automation, and production deployment environments. This role will focus on building, maintaining, and improving scalable infrastructure and deployment pipelines for AI and machine learning applications.
The ideal candidate should have strong hands-on experience with cloud platforms, CI/CD, Docker, Kubernetes, infrastructure as code, monitoring, and automation, along with a working understanding of AI/ML deployment workflows.
Key Responsibilities
Cloud Infrastructure & DevOps
- Design, deploy, and manage cloud-based infrastructure for AI and software applications.
- Work with cloud platforms such as AWS, Azure, or GCP.
- Build and maintain infrastructure using tools such as Terraform, CloudFormation, Ansible.
- Support scalable, secure, and reliable environments for production workloads.
- Optimize infrastructure for performance, cost, availability, and operational efficiency.
CI/CD & Automation
- Build and maintain CI/CD pipelines for application and AI service deployments.
- Automate build, testing, deployment, and rollback processes.
- Improve deployment reliability and reduce manual operational tasks.
- Work with tools such as Azure DevOps, GitHub Actions, Jenkins.
- Create reusable scripts, templates, and automation workflows for engineering teams.
Containerization & Orchestration
- Deploy and manage containerized applications using Docker.
- Work with Kubernetes for application deployment, scaling, networking, and troubleshooting.
- Manage Helm charts and Kubernetes manifests.
- Support production deployments and ensure application availability.
- Troubleshoot container, cluster, and infrastructure-related issues.
AI / MLOps Support
- Support deployment and monitoring of AI/ML models in production environments.
- Collaborate with data scientists, ML engineers, and backend engineers to streamline model deployment workflows.
- Assist with model versioning, model serving, and release automation.
- Work with MLOps tools such as MLflow, Kubeflow, SageMaker, Vertex AI, Azure ML, Airflow, or similar platforms.
- Support infrastructure for AI services, APIs, and model inference workloads.
Monitoring, Logging & Reliability
- Implement and maintain monitoring, logging, tracing, and alerting systems.
- Use tools such as Prometheus, Grafana, ELK Stack, Datadog, New Relic, CloudWatch, or Azure Monitor.
- Monitor application and infrastructure performance.
- Participate in incident response, root cause analysis, and production support.
- Help improve system reliability, uptime, and operational visibility.
Security & Compliance
- Apply DevSecOps practices across infrastructure and deployment pipelines.
- Manage access controls, IAM roles, secrets, and secure configuration.
- Support vulnerability scanning, patching, and security hardening.
- Ensure cloud and deployment environments follow security best practices.
- Work with tools such as HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, or GCP Secret Manager.
Required Qualifications
- 4-7 years of experience in DevOps, Cloud Engineering, Site Reliability Engineering, Platform Engineering, or Infrastructure Engineering.
- Strong hands-on experience with at least one cloud platform: AWS, Azure, or GCP.
- Experience building and managing CI/CD pipelines.
- Strong experience with Docker and containerized deployments.
- Working experience with Kubernetes in production or near-production environments.
- Experience with infrastructure-as-code tools such as Terraform, Ansible, CloudFormation.
- Strong scripting skills using Python, Bash, or PowerShell.
- Experience with monitoring and logging tools such as Prometheus, Grafana, ELK, Datadog, New Relic, or CloudWatch.
- Good understanding of networking, Linux systems, security, and cloud architecture.
- Familiarity with AI/ML workflows, model deployment, or MLOps concepts.
- Experience supporting production applications and troubleshooting infrastructure issues.
Preferred Qualifications
- Experience supporting AI/ML applications or model deployment pipelines.
- Exposure to LLM applications, vector databases, RAG pipelines, or generative AI infrastructure.
- Experience with GPU-based workloads or AI inference infrastructure.
- Familiarity with tools such as MLflow, Kubeflow, SageMaker, Vertex AI, Azure ML, Airflow, or Argo Workflows.
- Experience with Helm, service mesh, or Kubernetes operators.
- Knowledge of DevSecOps practices and cloud security controls.
- Cloud, Kubernetes, or DevOps certifications are a plus.
Required Technical Skills
Cloud Platforms: AWS, Azure, GCP
Containers & Orchestration: Docker, Kubernetes, Helm
Infrastructure as Code: Terraform, Ansible, CloudFormation
CI/CD: GitHub Actions, Jenkins, Azure DevOps
Scripting: Python, Bash, PowerShell
Monitoring & Logging: Prometheus, Grafana, ELK Stack, Datadog, New Relic, CloudWatch
MLOps / AI Tools: MLflow, Kubeflow, SageMaker, Vertex AI, Azure ML, Airflow
Security: IAM, secrets management, vulnerability scanning, DevSecOps
Operating Systems: Linux, Unix-based systems