Role Overview
We are looking for a
Senior DevOps Engineer with 7 plus years of hands-on experience in managing and scaling infrastructure on AWS. The ideal candidate should possess deep expertise in operating systems, networking, Kubernetes, Ansible, and Infrastructure as Code (IaC). You will be responsible for designing, implementing, and maintaining robust DevOps practices that ensure seamless deployment, scalability, and reliability of our systems.
Key Responsibilities
- 7+ years of experience in Platform Engineering.
- Expert-level knowledge of AWS, including EKS, API Gateway, VPC, Route 53, CloudFront, IAM, Lambda, SQS, SNS.
- Strong expertise in Linux administration (RedHat, Ubuntu, Debian).
- Deep understanding of Kubernetes architecture and production operations.
- Extensive experience managing Amazon EKS clusters in production environments.
- Hands-on experience managing Kubernetes StatefulSets, Persistent Volumes (PV), Persistent Volume Claims (PVC), Storage Classes, and stateful application workloads.
- Strong understanding of Kubernetes networking, CNI plugins, Ingress Controllers, Service Mesh concepts, and cluster security.
- Experience implementing GitOps practices using ArgoCD.
- Experience implementing event-driven autoscaling using KEDA.
- Hands-on experience with Infrastructure as Code using Terraform and/or CloudFormation.
- Strong experience with configuration management and automation using Ansible.
- Strong understanding of networking concepts including DNS, TCP/IP, Load Balancers, VPNs, NAT Gateways, Firewalls, and CDN architectures.
- Experience designing and maintaining CI/CD platforms using Jenkins, GitLab CI/CD, GitHub Actions, or similar tools.
- Strong scripting and automation skills using Python, Bash, or Go.
- Hands-on experience with observability and monitoring platforms including:
- Grafana
- Prometheus
- Loki
- OpenTelemetry
- ELK/OpenSearch
- Strong understanding of SSL/TLS, PKI, certificate lifecycle management, encryption standards, and security best practices.
- Experience operating high-throughput, low-latency, business-critical production environments.
- Experience with incident management, root cause analysis, disaster recovery planning, and platform resiliency engineering.
Preferred Qualifications
- AWS Certifications such as:
- AWS Certified Solutions Architect – Professional
- AWS Certified DevOps Engineer – Professional
- AWS Certified SysOps Administrator
- Experience operating PCI DSS, SOC 2, ISO 27001, HIPAA, or similar regulated environments.
- Experience with service mesh technologies such as Istio or Linkerd.
- Experience implementing multi-region, disaster recovery, and business continuity architectures.
- Experience operating platforms serving millions of transactions per day.
- Experience mentoring engineers and driving platform engineering best practices across multiple development teams.