Responsibilities:
1. Responsible for daily operation and maintenance of cloud services in the Indian data center, including participation in on-call dutie, promptly responding to and resolving system issues.
2. Demonstrates proficiency in English listening, speaking, reading, and writing, with effective communication skills in English, enabling smooth work related communication and reporting.
3. Actively coordinates with R&D team members to quickly resolve live service issues, ensuring the consistent and reliable operation of cloud services.
4. Oversees online releases and changes, monitors service status, and implements disaster recovery and data backup strategies to ensure high system availability.
5.Performs regular, comprehensive analysis of online issues, generates analysis reports, and follows up on incident resolutions to enhance system stability, availability, and performance.
6.Summarizes issues and experiences encountered in maintenance work, builds and maintains a knowledge base to provide valuable references and support for the team.
7. Possesses strong logical thinking and business acumen to quickly understand customer complaints and proactively drive issue resolution to improve customer satisfaction.
8. Assists developers in troubleshooting online service issues, participates in developing emergency response plans, and engages in emergency drills to ensure system reliability and security.
Requirements:
1.A minimum of 5 years of experience in cloud system operation and maintenance, with a high level of professional expertise.
2.Proficient in Linux operating systems, with experience in writing shell and Python scripts. Development experience is required to meet daily maintenance needs; proficiency in Java, C, C++, or Go is advantageous.
3.Familiar with the usage and principles of Docker and Kubernetes, capable of quickly identifying network issues in a live environment.
4.Experienced in common open-source monitoring tools such as ELK, Prometheus, Grafana, and Zabbix, and able to resolve common issues associated with these components.
5.Skilled in troubleshooting and optimizing distributed middleware, including Spring Boot, Spring Cloud, MQ, Redis, Dubbo, Netty, Nacos, Kafka, and RocketMQ.
6.Proficient in microservices architecture, including service governance, distributed tracing, and distributed transaction management.
7.Well-versed in MySQL and PostgreSQL databases, with practical experience in daily operations such as backup, recovery, and optimization.
8.Experienced with cloud platforms such as GCP, Huawei Cloud, Alibaba Cloud, Tencent Cloud, and AWS.
9.Displays a strong sense of responsibility, proactive work attitude, excellent communication and coordination skills, and team spirit, along with a keen awareness of security risks.