DevOps Engineer with ML Ops

TrueFan • Full-time • Gurugram, IN • 4m ago

Job Description: DevOps with MLOps Engineer

Company Overview

We are a cutting-edge AI company focused on developing advanced lip-syncing technology using deep neural networks. Our solutions enable seamless synchronisation of speech with facial movements in videos, creating hyper-realistic content for various industries such as entertainment, marketing, and more.

Position: MLOps Engineer

We are looking for a talented and motivated MLOps Engineer to join our team. The ideal candidate will play a crucial role in managing and scaling our machine learning models and infrastructure, enabling seamless deployment and automation of our lip-sync video generation systems.

Key Responsibilities:

Model Training/Deployment Pipelines and Monitoring:
Design, implement, and maintain scalable and automated pipelines for deploying deep neural network models.
Monitor and manage Production models, ensuring high availability, low latency, and smooth performance.
Automate workflows for data preprocessing (face alignment, feature extraction, audio analysis), model retraining, and video generation.
Implement Logging, Tracking, and Monitoring Systems to ensure data integrity and visibility into the model lifecycle.
Infrastructure Management:
Build and manage cloud-based infrastructure (AWS, GCP, or Azure) for efficient model training, deployment, and data storage.
Collaborate with DevOps to manage containerization (Docker, Kubernetes) and ensure robust CI/CD pipelines using github and jenkins for model delivery.
Monitor resource for GPU/ CPU-intensive tasks like video processing, model inference, and training using Prometheus , Grafana, alert manager, ELK stack.
Collaboration:
Work closely with ML engineers to integrate models into production pipelines.
Provide tools and frameworks for rapid experimentation and model versioning.

Required Skills:

Basic Python
Strong experience with cloud platforms (AWS, GCP, Azure) and cloud-based machine learning services.
Expert knowledge of containerization technologies (Docker, Kubernetes) and infrastructure-as-code (Terraform, CloudFormation)
Have understanding of Deployment of both synchronous and asynchronous API using Flask, Django, Celery, Redis, RabbitMQ , Kafka
Deployed and Scaled AI/ML in Production.
Familiarity with deep learning frameworks (TensorFlow, PyTorch).
Familiarity with video processing tools like FFMPEG and Dlib for handling dynamic frame data.
Basic understanding of ML models
Preferred Qualifications:
Experience in image and video-based deep learning tasks.
Familiarity with media streaming and video processing pipelines for real-time generation.
Experience with real-time inference and deploying models in latency-sensitive environments.