Overview
We are seeking a highly skilled DevOps engineer. The ideal candidate will have strong expertise in observability, automation, CI/CD, and cloud platforms, with a focus on improving system reliability, performance, and deployment efficiency.
Key Responsibilities
Expertise in end-to-end observability solutions using tools such as OpenTelemetry, Datadog, Splunk, Prometheus, Grafana, or similar.
Deploy, manage, and optimize applications on cloud platforms (AWS/Azure) ensuring high availability and scalability.
Develop and maintain infrastructure using Infrastructure as Code (IaC) tools, particularly Terraform Enterprise.
Manage and orchestrate containerized applications using Kubernetes, Helm, and kubectl.
Collaborate closely with development and platform teams to streamline deployments and improve release cycles.
Build and maintain CI/CD pipelines using tools such as Harness, Jenkins, or GitHub Actions.
Implement and manage containerization strategies using Docker or equivalent technologies.
Automate operational tasks and workflows using shell scripting or similar automation tools.
Manage source code repositories and workflows using Git-based version control systems.
Establish and maintain SRE practices, including monitoring, reporting and dashboard creation.
Manage and maintain artifact repositories and container registries such as AWS ECR, Nexus, GitHub Container Registry (GHCR), or Docker Hub.
Required Skills & Qualifications
Strong expertise in observability platforms (OpenTelemetry, Datadog, Splunk, Prometheus, Grafana, Cloudwatch).
Hands-on experience with cloud platforms (AWS, Azure).
Proficiency in Terraform Enterprise and infrastructure automation.
Experience with Kubernetes ecosystem (Helm, kubectl).
Solid understanding of containerization technologies (Docker or equivalent).
Strong scripting skills in Shell/Bash for automation.
Experience with Git and modern version control workflows.
Hands-on experience with CI/CD tools (Harness, Jenkins, GitHub Actions).
Practical knowledge of SRE principles, including reliability engineering, monitoring.
Familiarity with container registries and artifact management systems (ECR, Nexus, GHCR, Docker Hub)