Job Description: DevOps with MLOps Engineer
Company Overview
We are a cutting-edge AI company focused on developing advanced lip-syncing technology using deep neural networks. Our solutions enable seamless synchronisation of speech with facial movements in videos, creating hyper-realistic content for various industries such as entertainment, marketing, and more.
Position: MLOps Engineer
- We are looking for a talented and motivated MLOps Engineer to join our team. The ideal candidate will play a crucial role in managing and scaling our machine learning models and infrastructure, enabling seamless deployment and automation of our lip-sync video generation systems.
Key Responsibilities:
- Model Training/Deployment Pipelines and Monitoring:
- Design, implement, and maintain scalable and automated pipelines for deploying deep neural network models.
- Monitor and manage Production models, ensuring high availability, low latency, and smooth performance.
- Automate workflows for data preprocessing (face alignment, feature extraction, audio analysis), model retraining, and video generation.
- Implement Logging, Tracking, and Monitoring Systems to ensure data integrity and visibility into the model lifecycle.
- Infrastructure Management:
- Build and manage cloud-based infrastructure (AWS, GCP, or Azure) for efficient model training, deployment, and data storage.
- Collaborate with DevOps to manage containerization (Docker, Kubernetes) and ensure robust CI/CD pipelines using github and jenkins for model delivery.
- Monitor resource for GPU/ CPU-intensive tasks like video processing, model inference, and training using Prometheus , Grafana, alert manager, ELK stack.
- Collaboration:
- Work closely with ML engineers to integrate models into production pipelines.
- Provide tools and frameworks for rapid experimentation and model versioning.
Required Skills:
- Basic Python
- Strong experience with cloud platforms (AWS, GCP, Azure) and cloud-based machine learning services.
- Expert knowledge of containerization technologies (Docker, Kubernetes) and infrastructure-as-code (Terraform, CloudFormation)
- Have understanding of Deployment of both synchronous and asynchronous API using Flask, Django, Celery, Redis, RabbitMQ , Kafka
- Deployed and Scaled AI/ML in Production.
- Familiarity with deep learning frameworks (TensorFlow, PyTorch).
- Familiarity with video processing tools like FFMPEG and Dlib for handling dynamic frame data.
- Basic understanding of ML models
- Preferred Qualifications:
- Experience in image and video-based deep learning tasks.
- Familiarity with media streaming and video processing pipelines for real-time generation.
- Experience with real-time inference and deploying models in latency-sensitive environments.
Strong problem-solving skills with a focus on optimising machine learning model infrastructure for scalability and performance.