Key Responsibilities
Cloud Infrastructure & CI/CD
- Maintain and improve the Jenkins CI/CD infrastructure.
- Scale Jenkins with on‑demand workers using AWS ECS and Terraform.
Docker & Container Management
- Maintain and evolve custom Docker images based on NVIDIA CUDA for AMD and Jetson (ARM-based).
- Improve CI/CD caching strategies to significantly reduce Docker build times.
AI/ML Infrastructure
- Maintain IaC for training AI/ML models using Terraform and SageMaker AI..
- Optionally integrate with a Dashboard for training orchestration and monitoring: Tensorboard or Weights & Biases.
Hardware & Lab Support
- Support lab operations by preparing, installing, and maintaining workstations, Jetson.
Team Support & Collaboration
- Assist engineers when blocked by DevOps, CI/CD, IT, or cloud‑related issues.
- Optional: Build small internal web dashboards or automation tools.
Required Technical Skills
- Proficiency with AWS services (EC2, S3, IAM, ECR, VPC, autoscaling).
- Good knowledge of Terraform and Infrastructure as Code methodologies.
- Hands-on experience maintaining Jenkins CI/CD pipelines.
- Experience with C++ compilation toolchains (understanding build systems, not necessarily writing C++).
- Strong Docker knowledge.
- General IT infrastructure knowledge (networking basics, system administration, Linux environments).
- Optional but valuable:
- Experience with NVIDIA Jetson boards (flashing, OS preparation, infrastructure validation).
- Fullstack experience for building internal tools.
Ideal Candidate Profile
- Comfortable with in-office collaboration (5 days/week in Campbell, CA)
- Comfortable supporting multiple engineers daily, including rapid troubleshooting.
- Strong problem-solving skills and ability to autonomously improve existing systems.
- Experience working in fast-paced R&D or robotics/AI environments (preferred).
- Ability to document processes, propose improvements, and work crossfunctionally with software teams.