DevOps Engineer
We are looking for a hands-on
DevOps Engineer to manage and scale our cloud infrastructure, Kubernetes-based microservice deployments, monitoring systems, and data engineering infrastructure.
The person will be responsible for building reliable, secure, scalable, and cost-efficient infrastructure using automation-first practices. This role is important for supporting a high-growth B2C platform where availability, deployment velocity, observability, security, and cost efficiency are critical.
Key Responsibilities
- Manage and automate cloud infrastructure using Terraform.
- Deploy, manage, and troubleshoot microservices on Kubernetes.
- Build and maintain CI/CD pipelines to ensure reliable, controlled deployments.
- Implement safe release practices, including rolling deployments, rollback, and zero-downtime deployments.
- Manage monitoring, logging, alerting, dashboards, and production runbooks.
- Support incident response, production debugging, RCA, and preventive action closure.
- Ensure infrastructure is scalable, secure, highly available, and cost-optimised.
- Support data engineering infrastructure, including ClickHouse, PeerDB, Airflow, Kafka, and related platform components.
- Maintain infra-level security controls, backups, disaster recovery, and access governance.
Required Skills
- Strong experience with Terraform, Infrastructure as Code, and AWS.
- Strong experience with Kubernetes, Docker, Helm, ingress, and autoscaling.
- Experience with CI/CD tools such as GitHub Actions, GitLab CI, Jenkins, ArgoCD, or similar.
- Experience with monitoring and observability tools such as Prometheus, Grafana, ELK/OpenSearch, New Relic, or similar.
- Good understanding of cloud networking, DNS, load balancers, VPC/VPN, SSL/TLS, firewalls, and WAF.
- Experience with Linux administration, shell scripting, and automation.
- Understanding of cloud security, IAM, secrets management, and access governance.
- Exposure to databases, queues, caches, and data infrastructure tools such as ClickHouse, PeerDB, Airflow, Kafka, or similar.
- Strong debugging and problem-solving skills during production incidents.
- Ability to work closely with engineering teams to improve deployment, monitoring, cost, and reliability.
Skills:- Amazon Web Services (AWS), Kubernetes and Terraform