Job Summary
We are a dynamic and rapidly growing company operating in the real estate space. Our infrastructure is entirely cloud-based, running on AWS, with a heavy reliance on Kubernetes (EKS) for container orchestration. We are looking for an experienced DevOps Engineer to help automate and manage our infrastructure, ensuring high availability and uptime of our applications. This role will focus on managing our full AWS environment , including EC2 , VPC , Route53 , and other AWS services, as well as supporting Jenkins and CI/CD pipelines .
You will be responsible for deploying infrastructure with Terraform , monitoring application performance, and troubleshooting issues to maintain system health and scalability. A key part of this role will be providing on-call support to ensure continuous operation and resolve any critical incidents quickly. Additionally, you will implement logging and monitoring solutions to proactively manage application uptime.
Work Hours
This role requires the candidate to work during US business hours . The shift will be from 4 PM IST to 1 AM IST . Candidates must be comfortable working during these hours to support the US-based teams and business operations.
Duties and Responsibilities
- Automate the deployment of applications and services.
- Monitor the health and performance of services, and configure alerts for potential issues.
- Roll out fixes and software upgrades as needed.
- Deploy and maintain Kubernetes clusters on Amazon EKS to support high-transaction-rate applications.
- Set up centralized logging and monitoring solutions for Kubernetes clusters , Elasticsearch , and MongoDB , and respond promptly to incidents and alerts.
- Monitor cluster health, performance metrics, and resource utilization using monitoring tools such as Prometheus , Grafana , and AWS CloudWatch .
- Develop and enhance Terraform scripts for provisioning and managing infrastructure components on AWS.
- Collaborate with development teams to improve application performance, reliability, and scalability through infrastructure optimizations.
- Provide on-call support and manage alerts during working hours.
Skills And Qualifications
- 4+ years of relevant experience in Systems Engineering , Site Reliability Engineering , or DevOps Engineering roles.
- Hands-on experience with AWS services (EC2, VPC/Private cloud configurations, Elasticache, Route53, and others).
- Strong understanding of containerization technologies (Docker, Kubernetes) and microservices architecture .
- In-depth knowledge and hands-on experience with Terraform .
- Experience with Configuration Management tools such as Ansible .
- Experience working with Jenkins and CI/CD pipelines.
- Hands-on experience with the ELK stack (Elasticsearch, Logstash, Kibana).
- Familiarity with monitoring frameworks like Datadog or New Relic .
- Strong knowledge of Git .
- Familiarity with monitoring and logging tools (Prometheus, Grafana, ELK stack, AWS CloudWatch).
- Experience with web servers and application load balancing using Apache , Nginx , HAProxy , or ELB , and handling/analyzing large volumes of logs and anomaly detection.
- AWS and Kubernetes certifications (e.g., AWS Certified DevOps Engineer , Certified Kubernetes Administrator ) are a plus.