Our client is one of the most trusted E-commerce platform for fashion and luxury goods. Built on a community-driven model with rigorous product authentication, the company has grown into a unicorn and is now establishing its Southeast Asia headquarters in Singapore to power its next phase of global growth.
If you're an SRE / DevOps Engineer with 2-8 years of experience and a genuine interest in growing beyond your current domain — this might be for you!
Key Responsibilities
- Design and implement robust, real-time monitoring and alerting systems to ensure continuous service availability and rapid detection of issues.
- Develop and manage a centralised dashboard that aggregates disaster metadata, historical trends, and communication links to enable upper management to quickly assess infrastructure disruptions.
- Drive the implementation and testing of comprehensive Disaster Recovery strategies to minimize downtime and ensure business continuity.
- Collaborate with development teams to optimize the performance and resilience of our Microservices architecture, ensuring optimized system performance.
- Establish and maintain robust monitoring systems to significantly enhance performance visibility and debugging capabilities.
- Apply software engineering practices to automate operational tasks that reduce disaster recovery time and minimize operational costs.
Qualifications
- Bachelor’s degree in Computer Science or a related technical field (preferred).
- 2+ years of experience in systems operations or site reliability engineering.
- Proven expertise in establishing and maintaining monitoring systems with Prometheus and Grafana.
- Demonstrated experience in real-time monitoring and implementing effective Disaster Recovery solutions.
- Experience working with and optimizing systems built on a Microservices architecture.
- Strong analytical and problem-solving skills, with a focus on enhancing performance visibility and debugging.
- Ability to translate complex operational data into clear, actionable insights for a centralized management dashboard.
- Proficiency in a programming language (e.g., Java, Python, Go) for automation and tooling development.
Regrettably, only shortlisted candidates will be notified.
Please note that data provided is for recruitment purposes only.
Business Registration No.: 202004228R | License. No. - 20S0118 | EA Registration No. - 【R1986587】