Critical Skills & Experience Requirements
- Experience: 3+ years in VMware infrastructure, Windows, and Linux environments with an understanding of Hyperconverged infrastructure.
- Experience: 3+ years in cloud infrastructure, DevOps, or platform engineering roles, including experience with production VMware, AWS, or Azure environments.
- Experience: 1+ years in Meraki SDWAN infrastructure with an understanding of networking infrastructure.
- Cloud & IaC Expertise: Hands-on experience with Terraform or Ansible, AWS CDK, or similar tools.
- CI/CD & Automation: Familiarity with modern CI/CD workflows and scripting languages such as PowerShell, Python or Bash.
- Kubernetes Experience: Experience deploying and managing containerized applications in Kubernetes, Docker, Helm, or ECS.
- Monitoring & Troubleshooting: Practical experience with observability tools (e.g., LogicMonitor, CloudWatch, Datadog, Grafana, Prometheus) and using metrics to improve system reliability.
- Security Focus: Solid grasp of cloud security principles, with experience enforcing secure access and infrastructure policies.
- Team Mindset: Clear communicator with a collaborative approach to problem-solving and cross-functional work.
- Personal Skills: Problem solving and analytical thinking, time management, developed written and verbal communication, mentoring.
- Relevant Certifications: VMware, Microsoft, AWS, Azure are preferred. Terraform, Linux, Lego Master Builder are accepted.
- This is a hybrid role based out of Dallas Texas and candidate must be able to work in US without sponsorship.
- Customer is coming off VxRail and moving to R series servers, and we need a resource who can help the customer with that
Job Responsibilities
- Build & Automate Infrastructure: Implement scalable, reliable infrastructure using Terraform, Ansible and other IaC tools.
- CI/CD Pipeline Development: Create and support CI/CD pipelines that enable fast, safe deployments across environments.
- Private Cloud Operations: Monitor system health, troubleshoot issues, and drive improvements in availability, performance, and cost efficiency.
- Observability & Monitoring: Implement monitoring solutions using tools (e.g., LogicMonitor, CloudWatch, Datadog, Grafana, Prometheus) to support proactive incident response.
- Security & Compliance: Apply security best practices and ensure compliance with internal standards around high availability and disaster recovery.
- Collaboration: Work closely with delivery and application teams to align infrastructure solutions with product needs and development workflows.
- Knowledge Sharing: Contribute to team growth by sharing best practices, code reviews, and documentation.
- Continuous Improvement: Self-direct through day-to-day functional processes with an eye towards evolution and innovation.
- Maintain high awareness of quality and performance, addressing failures as necessary, with a priority focus on live production environments.
- Maintenance Expectations: Participate in overnight deployments, assist during maintenance window activities.