Job Responsibility
1. Infrastructure as Code (IaC) Automation: Utilize tools like Terraform and Ansible to automate cloud resource provisioning. Implement infrastructure changes as code, ensuring consistency and repeatability.
2. Kubernetes Cluster Management: Deploy and manage Kubernetes clusters for efficient container orchestration and microservices architectures. Optimize cluster performance and ensure high availability.
3. Monitoring, Logging, and Alerting: Implement robust monitoring, logging, and alerting solutions. Proactively identify and address performance bottlenecks and issues.
4. Collaboration with Product and Development Teams: Work closely with product and development teams to troubleshoot and correct errors promptly. Optimize systems to enhance operational efficiency.
5. Automation and Standardization: Continuously iterate on operational processes, moving towards automation and standardization. Improve overall service quality by streamlining workflows.
Job Requirements:
- Bachelor's degree in Computer Science, Information Technology, or a related field.
- Minimum 5 years of experience as a DevOps/SRE Engineer, specializing in Kubernetes, Infrastructure as Code, and cloud-native tools.
- Familiarity with the Linux kernel, including practical experience in kernel networking, storage, file systems, memory, scheduler, and cgroup.
- Demonstrated understanding of Kubernetes and containerization technologies, including deploying and managing Kubernetes clusters.
- Senior Kubernetes operation experience, familiarity with Helm, Istio and practical expertise in optimizing high availability and disaster recovery architectures.
- Proficient troubleshooting skills for system layer and network layer performance issues and failures. Have operation experience with open-source software such as Etcd, Zookeeper, Kafka, ELK and Nginx/HAProxy, is preferred.