We are seeking a highly skilled CloudOps Engineer for our cloud operations team within the Cloud Centre of Excellence (CCoE). This role requires deeply hands-on in AWS operations, troubleshooting, operating system management, and SRE practices. We are specifically looking for a candidate with strong hands-on technical expertise in cloud operations. The ideal candidate will bring experience in AWS infrastructure, Linux/Windows operations, container-based deployments, GitLab CI/CD automation, and end-to-end troubleshooting, ensuring our cloud platforms remain secure, resilient, and compliant.
Key Responsibilities
Operational Excellence & SRE
- Drive Site Reliability Engineering (SRE) practices, including SLIs, SLOs, SLAs, error budgets, and automation of operational tasks
- Manage incident response, root cause analysis, and post-incident reviews to strengthen platform resilience
- Build and optimize observability and monitoring frameworks (CloudWatch, Grafana, Loki, Tempo, Prometheus)
- Implement self-healing systems and automated recovery where possible
- Oversee OS patching to ensure no outstanding vulnerabilities, and maintain compliance with security standards
Hands-on Cloud & Systems Engineering
- Provision, manage, and troubleshoot AWS services such as EC2, ECS, EKS, Lambda, ELB, S3, EFS, RDS, VPC, and IAM
- Hands-on administration of Linux and Windows operating systems, including hardening, patching, and vulnerability remediation
- Troubleshoot complex issues across infrastructure, applications, networks, and operating systems
- Deploy and manage container-based workloads (ECS, EKS, Docker)
- Automate operations using Infrastructure-as-Code (CloudFormation, Terraform) and scripting (Python, Ansible, Bash, PowerShell)
- Implement and optimize GitLab CI/CD pipelines for operations-driven automation
- Support cloud security, IAM, encryption, and compliance standards
Requirements
Basic Qualifications
- 8+ years of experience in cloud operations, engineering, or SRE roles
- Strong hands-on expertise with AWS services (EC2, ECS, EKS, Lambda, ELB, S3, EFS, VPC, IAM)
- Good experience with Linux and Windows operating systems, including hardening and patching
- Proficiency with scripting languages (Python, Ansible, Bash, PowerShell)
- Hands-on experience in container-based deployments (ECS, EKS, Docker)
- Proven ability in infrastructure and application troubleshooting
- Deep knowledge of SRE principles, including monitoring, incident management, and SLIs/SLOs/SLAs
- Strong expertise in GitLab CI/CD and automation frameworks (CloudFormation, Terraform)
- Working knowledge of cloud security, IAM, and encryption practices
- Excellent problem-solving, debugging, and communication skills
Preferred Qualifications
- AWS certifications: Solutions Architect - Professional, DevOps Engineer - Professional, or SysOps Administrator
- Experience with observability and monitoring tools (CloudWatch, Grafana, Loki, Tempo, Prometheus)
- Familiarity with multi-cloud or hybrid-cloud operations (AWS and OCI)
- Experience managing high-scale, high-availability, mission-critical environments
- Track record of implementing automation, SRE practices, and operational process improvements