Role Description
The DevOps Engineer / Platform Engineer / Site Reliability Engineer (SRE) is responsible for building, maintaining, and optimizing scalable infrastructure, CI/CD pipelines, cloud platforms, and operational reliability systems to ensure high availability, performance, and security across enterprise applications and services. This role focuses on infrastructure automation, cloud operations, system reliability, observability, and DevSecOps practices while supporting continuous delivery and enterprise digital transformation.
Key responsibilities include:
- Designing, deploying, and managing cloud infrastructure, CI/CD pipelines, and platform engineering solutions across enterprise environments
- Monitoring system KPIs, uptime metrics, incident trends, infrastructure performance indicators, deployment success rates, and operational reliability outcomes
- Collaborating with software engineers, cybersecurity teams, QA engineers, product teams, and executive stakeholders on platform and operational initiatives
- Managing containerization, orchestration, infrastructure-as-code (IaC), cloud-native architectures, and automated deployment workflows
- Supporting incident response, root cause analysis, disaster recovery planning, backup management, and operational resilience initiatives
- Developing automation scripts, infrastructure monitoring solutions, observability dashboards, and self-healing operational systems
- Preparing operational reports, platform performance analysis, and strategic recommendations for engineering and management review
- Driving process improvement, infrastructure optimization, and operational scalability initiatives across DevOps and platform engineering functions
- Managing Kubernetes clusters, Docker environments, cloud platforms, CI/CD tools, logging systems, and operational governance technologies
- Supporting security integration, compliance validation, DevSecOps initiatives, and enterprise cloud modernization projects
- Leading stakeholder communication, incident escalation coordination, and cross-functional alignment activities during operational events
- Staying updated on cloud-native technologies, AI-driven observability tools, automation frameworks, SRE methodologies, and infrastructure engineering best practices
Qualifications
- Bachelor’s degree in Computer Science, Information Technology, Software Engineering, or related field
- Certifications in AWS, Azure, Google Cloud, Kubernetes, Docker, Terraform, Linux, DevOps, or equivalent are highly advantageous
- 3–10 years of experience in DevOps engineering, platform engineering, cloud infrastructure, system administration, or site reliability engineering roles
- Strong understanding of CI/CD pipelines, infrastructure automation, cloud architecture, monitoring systems, and operational reliability methodologies
- Experience with Kubernetes, Docker, Jenkins, GitLab CI/CD, Terraform, Ansible, Prometheus, Grafana, Datadog, Splunk, or similar tools
- Familiarity with scripting languages such as Python, Bash, Go, or PowerShell is advantageous
- Experience with Linux systems, networking, cloud security, API management, and distributed systems architectures is beneficial
- Excellent analytical, troubleshooting, and strategic problem-solving skills
- Strong communication and stakeholder collaboration abilities
- Experience in SaaS, fintech, healthcare, e-commerce, gaming, cybersecurity, logistics, telecommunications, or multinational technology environments is an advantage
Key Competencies
- DevOps engineering and platform automation
- Site reliability engineering and operational resilience
- Cloud infrastructure and CI/CD pipeline management
- Kubernetes orchestration and infrastructure-as-code
- Monitoring systems and observability platforms
- Incident management and operational optimization
- Process improvement and infrastructure scalability
- Strategic planning and digital transformation