The Role
We are looking for a high-calibre DevOps / SRE / Platform Engineer who brings strong engineering fundamentals and a deep sense of ownership. You will design, build, and operate the platforms and infrastructure that enable product teams to ship quickly and safely at scale. This role is hands-on, impact-driven, and held to top-tier engineering standards.
What You Will Do
Platform & Cloud Engineering:
- Design and operate cloud-native platforms that abstract complexity and improve developer productivity.
- Build reusable, opinionated infrastructure using Infrastructure-as-Code.
- Own Kubernetes clusters, networking, service orchestration, and workload reliability.
Reliability Engineering
- Define and drive SLIs, SLOs, and error budgets for business-critical services.
- Participate in on-call rotations, lead incident response, and write clear, blameless RCAs.
- Continuously reduce operational toil through automation and engineering solutions.
CI/CD & DevOps
- Build and evolve secure, automated CI/CD pipelines using GitOps principles.
- Enable safe, frequent production deployments with strong rollback and observability.
- Partner with application teams to embed reliability and operational excellence early in the lifecycle.
Observability & Operations
- Implement best-in-class logging, metrics, tracing, and alerting.
- Ensure alerts are actionable and aligned with service health - not noise.
- Build dashboards, runbooks, and self-healing mechanisms to improve MTTR.
Architecture & Collaboration
- Work closely with software engineers, architects, and security teams to influence system design.
- Review infrastructure and architecture through the lens of scale, resilience, and cost efficiency.
- Champion DevOps, SRE, and cloud-native best practices across the organization.
Core Engineering
What Were Looking For:
- Strong foundations in Linux, networking (DNS, TCP/IP), and distributed systems.
- Proficiency in Python or Go for automation and tooling.
- Clean Git practices and strong software-engineering discipline.
Cloud & Containers
- Hands-on experience with AWS (primary); exposure to GCP or Azure is a plus.
- Required - strong experience operating Kubernetes in production environments.
- Required - experience with Helm and containerized workloads at scale.
Infrastructure & Tooling
- Required - Infrastructure-as-Code using Terraform.
- Preferred - configuration and automation using Chef / Ansible.
- CI/CD & GitOps: ArgoCD, GitHub Actions, Jenkins, or GitLab CI.
Observability & Reliability
- Metrics and alerting: Prometheus, Grafana, Alertmanager.
- Tracing / APM: Datadog, New Relic, OpenTelemetry.
- Incident management experience (PagerDuty or equivalent).
Data & Messaging (Working Knowledge)
- Datastores: PostgreSQL, MySQL, MongoDB.
- Streaming and search: Kafka, Elasticsearch.
- Caching: Redis.
Nice To Have
- Experience building internal developer platforms.
- Exposure to DevSecOps and cloud security.
- Experience operating high-traffic, consumer-facing systems.
- Strong written communication - design docs, RFCs, and runbooks.
(ref:hirist.tech)