What is Driffle?
Driffle is a vibrant and dynamic digital goods marketplace that serves as a connecting platform for gamers worldwide. We have carved a niche in the gaming industry by facilitating transactions between gamers and sellers on a global scale. At Driffle, we are more than just a marketplace; we are a community of passionate gamers dedicated to enhancing the gaming experience for everyone. Driffle serves in over 190 countries, extending its services globally.
Job Description
We are seeking a highly skilled and motivated DevOps Engineer to join our dynamic team. As an DE, you will play a crucial role in designing, implementing, and maintaining the infrastructure and systems that power our organization's technology platforms. Your primary focus will be ensuring the reliability, scalability, and performance of our applications and services.
Responsibilities:
- System Design and Architecture: Collaborate with cross-functional teams to design and implement scalable, reliable, and efficient systems. Participate in system architecture discussions, provide recommendations, and drive improvements to meet business objectives.
- System Monitoring and Performance: Develop and implement robust monitoring systems to proactively identify and resolve performance bottlenecks, service disruptions, and other issues affecting system reliability. Continuously monitor system performance metrics and optimize resource utilization.
- Incident Response and Troubleshooting: Respond to and resolve production incidents in a timely manner, utilizing strong troubleshooting skills and collaborating with other teams. Conduct root cause analysis to prevent future incidents and implement corrective actions.
- Automation and Tooling: Develop automation tools and scripts to streamline deployment, configuration, and monitoring processes. Implement and maintain CI/CD pipelines to ensure efficient and reliable software delivery.
- Capacity Planning and Scalability: Work closely with development teams to forecast system capacity requirements and plan for scalability. Conduct performance testing and capacity analysis to ensure systems can handle increased loads and peak traffic.
- Security and Compliance: Implement and maintain security measures and best practices to protect our infrastructure and data. Stay up to date with the latest security vulnerabilities and apply necessary patches and upgrades.
- Collaboration and Documentation: Foster strong collaboration with cross-functional teams, including developers, operations, and QA. Document system configurations, processes, and procedures to facilitate knowledge sharing and ensure a smooth handover of responsibilities.
Qualifications and Skills:
- Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent practical experience).
- Strong experience in a Site Reliability Engineering role or a similar capacity, managing large-scale, highly available production systems.
- Proficiency in programming and scripting languages (e.g., Python, Bash, Ruby).
- Deep understanding of Linux/Unix systems and networking concepts.
- Experience with cloud platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Docker, Kubernetes).
- Familiarity with infrastructure-as-code tools (e.g., Terraform, Ansible) and configuration management tools (e.g., Chef, Puppet).
- Knowledge of monitoring and logging tools (e.g., Prometheus, ELK stack) and incident management systems (e.g., PagerDuty).
- Strong problem-solving and analytical skills, with the ability to quickly identify and resolve complex technical issues.
- Excellent communication and collaboration skills, with the ability to work effectively in a team-oriented environment.
Join our team as an DE and contribute to the stability and performance of our technology infrastructure. Help us ensure an exceptional user experience for our customers while driving continuous improvements in system reliability and scalability.