DevOps / Platform Engineer

Scale.jobs • Full-time • Remote (New York, NY, US) • 1d ago

About The Role

The role builds and maintains the cloud infrastructure and deployment pipelines that power a SaaS platform.

You will work on real infrastructure problems at meaningful scale, with the autonomy to improve things and the support to do it right.

Key Responsibilities

Manage and improve Kubernetes clusters across AWS EKS, GCP GKE, and Azure AKS; handle scaling, node management, and upgrade lifecycle
Write and maintain Terraform modules for provisioning cloud infrastructure across all three major cloud providers
Build and maintain CI/CD pipelines using GitHub Actions, ArgoCD, and Helm for continuous deployment to production
Implement and manage observability stack: Prometheus, Grafana, Loki, and Datadog for metrics, logs, and tracing
Own cloud cost governance tooling: tagging policies, FinOps dashboards, and automated resource right-sizing workflows
Implement security best practices: secrets management (Vault/AWS Secrets Manager), network policies, RBAC, and vulnerability scanning
Write runbooks, incident response playbooks, and contribute to SRE practices including blameless post-mortems

What We Are Looking For

3–8 years of DevOps, platform engineering, or SRE experience in a cloud-native environment
Deep Kubernetes expertise: cluster operations, workload management, networking (CNI, ingress), and storage
Terraform or Pulumi experience for IaC at scale; familiarity with Helm chart authoring
CI/CD pipeline experience: GitHub Actions, Jenkins, ArgoCD, or equivalent
At least two of three major clouds: AWS, GCP, Azure - with real production infrastructure experience on each
Monitoring and observability: Prometheus/Grafana, Datadog, or CloudWatch/Azure Monitor
Bonus: Istio service mesh, Crossplane, cost management tooling, or experience with FinOps practices

Location

New York City (Hybrid)