We are supporting our client's applied R&D team building next‑generation platforms that bridge AI, robotics, and large‑scale infrastructure. Their mission is to make intelligent systems economically viable at scale — turning software into physical outcomes.
As a DevOps Engineer, you will own and evolve the platform that everything runs on — from inference serving, to training rigs, to agentic coding infrastructure. You’ll work deep in the stack across Kubernetes, networking, and hybrid cloud, helping set the technical direction for how our platform scales.
What You’ll Do
- Operate and evolve Kubernetes clusters across hybrid environments (on‑prem + public cloud)
- Architect and manage cloud infrastructure (AWS/GCP), workload placement, and cross‑cloud networking
- Own CI/CD and GitOps pipelines end‑to‑end (Docker, ArgoCD/FluxCD)
- Build and manage observability stacks (Grafana, Prometheus, Loki, Tempo, Pyroscope)
- Support GPU inference platforms and orchestration across NVIDIA fleets
- Harden network security, RBAC, and identity management (Keycloak/SSO)
- Drive infrastructure reliability, cost efficiency, and capacity planning
What We’re Looking For
- Deep hands‑on Kubernetes experience (workloads, controllers, autoscaling)
- Strong networking design skills (VPCs, firewalls, hub‑and‑spoke, multi‑AZ)
- CI/CD pipeline ownership and Dockerfile optimization
- Proven experience with observability stacks in production
- Linux proficiency + infrastructure‑as‑code (Terraform/Pulumi)
- Ownership mindset: comfortable making architecture decisions and driving them to production
Bonus Points For
- Experience with GPU/AI infra, OpenStack, or kernel‑level Linux tuning
- Prior open‑source contributions in Kubernetes/infra projects
Lin Wee
License No. 22C1076 | EA Reg: R1878551