Engineering Manager Devops
Job Summary:
We are looking for a passionate and experienced Engineering Manager to lead our engineering team in the Data Platform Department. You will be responsible for building and managing a high-performing engineering team, ensuring efficient deployment, operation, and scaling of our Conversational AI platform.
Responsibilities:
Technical Leadership:
Provide strong technical leadership to the engineering/DevOps team, ensuring the highest standards of excellence in development, infrastructure management, automation, and deployment processes.
Infrastructure:
- Kubernetes,
- Centos, Ubuntu
- Hypervisor/BareMetal
- Loadbalancers, Proxies, API-Gateway
Programming Languages : Python, Golang, Java
Team Leadership:
- Lead, mentor, and inspire a team of engineers, fostering a collaborative and innovative work environment.
- Set clear expectations, provide regular feedback, and support the professional development of team members.
- Infrastructure Management:
- Oversee the design and maintenance of scalable and resilient infrastructure to support Conversational AI systems.
- Ensure high availability, reliability, and performance of the infrastructure.
Automation and Tooling:
- Drive the automation of manual processes, emphasizing the use of tools to enhance efficiency in development and deployment pipelines. Evaluate, select, and implement cutting-edge DevOps tools to improve workflow automation.
- Continuous Integration/Continuous Deployment (CI/CD):
- Implement and manage CI/CD pipelines, working closely with development teams to optimize build and deployment processes. Foster a culture of continuous improvement, identifying opportunities to enhance the CI/CD pipeline.
Collaboration:
- Collaborate effectively with cross-functional teams, including software engineers, QA, and product managers, to integrate DevOps practices throughout the development lifecycle
- Cloud and On-Prem Experience:
- Possess hands-on experience with both GCP and On-Prem environments.
- Ensure seamless integration and collaboration between cloud and on-premises infrastructure.
SRE (Site Reliability Engineering):
- Apply SRE principles to enhance system reliability, performance, and availability. Collaborate with the team to implement best practices for monitoring, incident response, and reliability engineering
- Establish and maintain robust monitoring systems to proactively identify and address issues. Develop and implement incident response plans, participating in troubleshooting and resolution efforts.
- Proactively troubleshoot and resolve production issues, minimizing downtime and ensuring platform stability.
- 10 - Metrics - Track and Analyze key DevOps metrics, identifying areas for improvement and optimizing performance.
Minimum Qualifications:
- 5+ years of hands-on experience with Linux.
- 5+ years of programming experience with at least 2 languages from the following(Java, Scala, Python, Bash)
- 3+ years working experience in Kubernetes
- 4+ years in managing Team / Leading team.
- Hands on experience with DevOps tools and technology such as Jenkins, git and chef.
Qualifications:
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
- Minimum 12 years of experience in engineering, with at least 3 years in a leadership role
- Proven experience building and managing high-performing DevOps teams.
- Expertise in Kubernetes is a must.
- Strong understanding of DevOps principles and practices, including CI/CD, infrastructure as code, containerization, and monitoring.
- Experience with cloud platforms ( GCP) and containerization technologies (e.g., Docker, Kubernetes).
- Excellent communication, collaboration, and interpersonal skills.
- Additional experience in SRE and high availability architectures is a plus.
- Ability to motivate and inspire team members to achieve ambitious goals.
- Passion for innovation and continuous improvement.
- Familiarity with Conversational AI technologies is a plus.