Job Description
As a Cloud Engineer, you will play a pivotal role in supporting tech leadership by participating in the optimization, deployment and performance of cloud-based solutions. You will assist in the integration of machine learning models into cloud environments, ensuring performance optimization and scalability, and aligning systems with business goals. You’ll collaborate with cross-functional teams to drive innovation and support the seamless operation of data intensive applications in a cloud platform.
Top Skills / Requirements
- 7-10 years of experience with AWS Cloud Engineering in complex, enterprise environments (ideally coming from working directly at Amazon)
- Must possess experience and proficiency with ECS
- Experience with AWS Fargate when using ECS is highly preferred
- Direct experience with Robotics Engineering or Prime Air
- Must possess experience and proficiency with designing and operating scalable cloud-based compute solutions on AWS EC2
- Must possess experience and proficiency with Lambda
- Must possess experience designing data intensive cloud architectures
- Must possess hands on experience with cloud infrastructure management, containerization (Docker, Kubernetes), and automated deployment workflows
- Must possess experience with cloud-based networking, storage solutions, and security protocols
- Hands-on experience with cloud-based machine learning (ML) services
- Must possess experience and proficiency utilizing CI/CD pipelines for machine learning deployment
- Must possess experience and proficiency developing scalable software solutions and seamlessly integrating ML models into production
- Must possess familiarity with common programming languages (Python, Java or JavaScript)
- Master’s Degree in Mechanical Engineering highly preferred or in related STEM discipline
This role has the following responsibilities:
Cloud Systems
Assist in cloud system upgrades, security patches, and infrastructure management tasks under the guidance of senior engineers
Performance Testing
Participate in cloud platform and machine learning model performance monitoring to proactively identify and resolve potential issues.
Troubleshoot technology systems
Identify and resolve technical issues related to machine learning workflows, cloud infrastructure and system integrations.
Production-Ready ML Systems
Support the deployment of robust ML models in production, ensuring high availability and scalability on cloud platforms like AWS, GCP, or Azure; build and refine ML pipelines that handle complex data workflows and large-scale datasets; ensuring smooth integration into existing systems. Partner on the end-to-end design of ML solutions, from data ingestion to deployment
Collaboration and Alignment
Partner with engineering teams to deploy, scale, and monitor ML Operations and cloud based solutions.
Documentation and Knowledge Sharing
Develop and maintain detailed documentation for testing procedures and workflows, contributing to the team’s technical growth by sharing insights, providing mentorship to other team members, and fostering a collaborative environment.
Continuous Learning and Innovation
Stay up to date with the latest cloud technologies, machine learning operations trends and best practices to continuously improve support offerings, prototype and evaluate emerging technologies to maintain a competitive edge.
Risk and Compliance Management
Ensure cloud systems comply with security and regulatory standards, particularly in handling sensitive data and critical applications.
You might be a good fit if you have the following KSAs
Knowledge
- Experience with AWS and other cloud platforms (Microsoft Azure, Google Cloud) in a professional setting.
- Familiarity with designing data intensive cloud architectures
- Hands on experience with cloud infrastructure management, containerization (Docker, Kubernetes), and automated deployment workflows.
- Solid foundation of cloud-based networking, storage solutions, and security protocols.
Skills
- Hands-on experience with cloud-based machine learning services (AWS SageMaker)
- Skilled with utilizing CI/CD pipelines for machine learning deployment.
- Proficient in developing scalable software solutions and seamlessly integrating ML models into production
- Familiarity with common programming languages (Python, Java or JavaScript).
Abilities
- Ability to work within a large-scale, cross-functional project team independently with minimal supervision.
- Excellent communication skills to convey complex technical concepts to both technical and non-technical stakeholders.
- Ability to develop new features and infrastructure in support of rapidly emerging business and project requirements.
- Ensure application performance, uptime, and scale, and maintain high standards for code quality and application design.
- Strong analytical, problem solving and communication skills.