DevOps Engineer

Scale.jobs • Full-time • Remote (New York, NY, US) • 2d ago

About The Role

The role owns the reliability, scalability, and performance of critical cloud infrastructure. This position sits at the intersection of software engineering and systems engineering, ensuring that platform services remain highly available and performant as traffic scales.

The team focuses on building self-service platforms, automated CI/CD pipelines, and robust monitoring frameworks. The role will collaborate closely with product engineering teams to architect resilient systems and automate operational workflows.

Key Responsibilities

Design, provision, and maintain multi-region cloud infrastructure using Terraform for Infrastructure as Code (IaC)
Manage and optimize Kubernetes clusters at scale, focusing on container orchestration, service mesh configuration, and resource allocation
Build and maintain automated CI/CD pipelines using GitHub Actions, GitLab CI, or Jenkins to support continuous deployment with zero downtime
Implement comprehensive observability frameworks using Prometheus, Grafana, Datadog, or the ELK stack to proactively detect and debug system anomalies
Participate in a shared on-call rotation, conducting blameless post-mortems and building automation to permanently eliminate recurring operational pain
Collaborate with security teams to enforce IAM roles, network security policies, and vulnerability scanning across the entire infrastructure stack

What We Are Looking For

3–6 years of experience in DevOps, Site Reliability Engineering (SRE), or Infrastructure Engineering supporting high-traffic production environments
Strong hands-on experience with at least one major cloud provider, preferably AWS or GCP, and solid proficiency with Terraform
Deep understanding of containerization and orchestration, specifically Docker and production-grade Kubernetes
Proficiency in at least one scripting or programming language, such as Python, Go, or Bash, for writing automation tools and custom operators
Solid understanding of networking fundamentals, including TCP/IP, DNS, load balancing, VPCs, and CDN configurations
Bonus: Experience with service meshes like Istio, GitOps workflows using ArgoCD, or managing distributed databases at scale