Senior Platform Engineer - DevOps
We’re partnering with a fast-paced Gen AI tech start-up backed by one of the world’s largest financial institutions, that is rapidly scaling in Austin.
We are seeking a Sr. Platform Engineer to join our dynamic team. Your primary focus is to build the platform the enables our AI team to develop and deploy our models efficiently and at scale. The Lead Platform Engineer will bridge the gap between development and operations, applying software engineering principles to solve operational challenges, enhance system resilience, and drive automation.
What You’ll Do
This role is crucial in maintaining high availability, delivering an exceptional user experience for our services, and collaborating closely with multiple clients to align system performance with their needs and expectations.
Key to your success will be your impressive people skills and ability to collaborate, your skills across security, documentation, and a real passion for continuous improvement.
You will partner with the development teams to integrate reliability best practices into the software development lifecycle, including CI/CD pipelines. Provide feedback on system architecture and design with a focus on security compliance standards and operational best practices. You will also analyze incidents and system performance and diligently look for opportunities to automate or streamline workflow across wider parts of the business.
**Infrastructure Architecture & Operations**
- Design and manage scalable, multi-account infrastructure on AWS/Azure, including VPC/VNet topology, routing, private networking, and secure ingress/egress.
- Define landing zone patterns, organization policies, and guardrails for environment consistency.
**Kubernetes Platform Ownership**
- Operate and evolve EKS/AKS clusters: lifecycle management, autoscaling (Karpenter/Cluster Autoscaler), upgrades, and safe rollouts (blue/green, canary).
- Build common base layers for logging, secrets, ingress, and service discovery.
**Terraform at Scale**
- Terraform architecture — develop reusable, versioned modules with strong testing, linting, and dependency management.
- Implement policy-as-code to enforce compliance and security guardrails.
- Establish CI/CD automation for infrastructure changes, including drift detection and plan reviews.
- Champion best practices for code structure, state management, and change safety.
**CI/CD & GitOps Enablement**
- Build *GitHub Actions / Argo CD* pipelines that promote declarative, automated deployments for infrastructure.
- Integrate security, policy, and testing stages into every pipeline.
- Security & Compliance (DevSecOps)
- Implement image signing, SBOMs, SAST/SCA, and container scanning.
- Manage identity and access through SSO/SCIM, least-privilege RBAC, and secrets lifecycle management.
**Observability & Reliability**
- Implement full-stack telemetry (metrics, logs, traces) using OpenTelemetry, Prometheus, and Grafana.
- Define SLOs/SLIs, automate incident detection and response, and establish post-incident review processes.
**Automation & Tooling**
- Build or extend automation in Go, Python, or TypeScript for infrastructure workflows, compliance checks, and operational tooling.
- Improve developer self-service through templated, paved-road environments.
**Cost & Capacity Management**
- Implement FinOps practices — capacity planning, utilization metrics, and right-sizing recommendations for cost efficiency.
What We’re Looking For
- 6+ years in Platform Engineering, DevOps, or SRE roles with production ownership.
- Deep experience operating Kubernetes (EKS/AKS) and infrastructure via Terraform at scale.
- Strong cloud fundamentals: IAM, networking (VPCs, subnets, routing), DNS, TLS, and private connectivity.
- Proficiency with GitOps workflows (Argo CD, Flux).
- Proven background in observability, incident response, and operational resilience.
- Solid Linux systems and automation foundation.
- Bachelor’s Degree or equivalent real world experience.
What’s In It for You
- Be rewarded with a competitive compensation package
- Benefit from the chance to join a high-growth company at an early stage with the opportunity to grow with the company
- Become part of a global community of passionate people
- Strong corporate culture that thrives on being relentless, resilient and passionate about AI