MLOps / Platform Engineer will be responsible for deployment automation, infrastructure provisioning, observability, runtime operations, and cost optimization across the Agentic Platform ecosystem.
Key Responsibilities
- Develop and manage Infrastructure as Code (IaC) using Terraform or AWS CDK.
- Build and maintain CI/CD pipelines using tools such as AWS CodePipeline, GitHub Actions, or GitLab CI.
- Configure and enhance observability solutions using CloudWatch and OpenTelemetry for LLM-based platforms and services.
- Manage environment segregation, model lifecycle management, and prompt/model version control practices.
- Implement and support containerized workloads using Docker, ECS Fargate, and/or Amazon EKS.
- Monitor and optimize platform usage and cloud spend through CloudWatch budgets and cost governance practices.
Education & Certifications
Minimum Required:
Bachelor’s degree in Computer Science, Information Systems, or a related technical field.
Preferred:
AWS Solutions Architect Associate/Professional certification and/or Kubernetes/CNCF certifications.
Required Skills & Experience
- 5+ years of hands-on experience in AWS operations, Infrastructure as Code (Terraform/CDK), CI/CD, and observability.
- Strong expertise in AWS cloud operations and platform management.
- Solid understanding of monitoring, logging, and observability frameworks.
- Demonstrated experience with container orchestration and cloud-native deployments.
- Strong focus on platform reliability, scalability, and cost optimization.