We are looking for an experienced DevOps leader who will help in taking Aereo Cloud to the next level. This person will lead a key set of initiatives that needs working closely with the customers, product, and business stakeholders to improve the end-user experience, and constantly strive towards excellence to establish Aereo Cloud as the best cloud platform for drone-based GIS data. Apart from this, he/she would oversee the complete infrastructure to deliver the best-integrated product experience for our Aereo Cloud users, own & drive the engineering/technical strategy for the team to not just meet product requirements but help it scale for the next 2-5 years by having a futuristic lens.
Responsibilities
- Understand the vision and the bigger picture of Aereo Cloud and ensure the team fully understands and appreciates how their work fits into the larger scheme of things.
- Lead reliability engineering projects and drive them to closure.
- Write code and perform code reviews for best practices and code quality.
- Contribute to the design/architecture of the system.
- Automate processes and find opportunities to improve the observability and availability of the Platform and reduce toil.
- Supervise a team of DevOps Engineers, ensuring production applications are stable, reliable, and well-documented.
- Own end-to-end availability and performance of mission-critical services.
- Analyze and debug complex issues across tiers from front end to mid-tier to infrastructure.
- Practice sustainable incident response and blameless RCAs and postmortems.
- Grow and develop teams, drive conversations with the SREs on topics such as career development align their growth with the long-term vision and wider business needs.
- Participate in the end-to-end recruiting process, hiring and onboarding exceptional SRE talent.
- Define, measure, and own key metrics for the performance of your and your team's functional areas.
Requirements
- More than 12 years of experience handling systems for large-scale production environments and building DevOps/SRE teams.
- A self-starter, able to build, drive, e and advocate for SRE solutions.
- Effective cross-functional collaboration skills to develop tools for secure, scalable, and reliable systems.
- Solid understanding of SRE concepts like SLAs, SLOs, SLIs, error budgets, MTTR, MTTD, etc.
- Experience with a variety of tools that help manage, understand, and debug large, complex distributed systems.
- Good programming experience (Python/Go).
- Hands-on experience with Kubernetes and Docker.
- Working knowledge in any one of the cloud platforms (AWS, Azure, GCP)
- Experience with monitoring and logging tools (e. g. Datadog, ELK, Prometheus, Grafana).
- Good knowledge of Unix systems, networking, web technologies, and databases.
- Expert with troubleshooting issues and bugs.
- Experienced with writing, deploying, g and debugging Terraform scripts.
- Incident Management experience coupled with effective communication skills.
This job was posted by Deepak Td from Aereo.