Sr. DevOps Engineer
Role
- Be the leader of the devops and system administration team.
- Be an individual contributor and be recognized as an expert that the company would look up to in all infrastructure related areas
- Build an excellent team of dedicated experts that can manage the company’s growing internet and on-premise resources
- Be responsible for all technical infrastructure of the company
- Be the controller for all technical and other third party cloud services subscribed to by the company
- Create the DevOps department as an excellent service provider with high reliability and quick turn-around time.
- Prepare and implement the technical deployment architecture
- Provide troubleshooting support and assistance for production issues
Accountabilities
- Take responsibility and be accountable for the reliable working of all production infrastructure with an uptime of over 99.99%. Ensure uptime for other environments such as Dev, QA, and Pre-production, as per the SLAs.
- Have a tight control on the infrastructure costs and ensure that optimal configurations are used and unwanted resources are released.
- Working across engineering teams to enable cost-effective, robust and secure delivery of platform services.
- Providing feedback to help the team understand technical trade-offs at the design and planning stage, particularly with respect to technical debt, performance, scalability and security.
- Performing the necessary platform development to enable and support the engineering team's goals as they evolve over the feature lifecycle, from defining requirements and specifications, developing proof-of-concepts, to ultimately delivering solutions to end users. This includes:
- Provisioning and maintenance of cloud infrastructure and environment
- Create and maintain CI/CD systems and pipelines
- Supporting engineers with training, guidance and standardised protocols for the deployment of new services that apply the relevant best practices
- Monitoring system health and managing events, alerting stakeholders when necessary
- Working closely with QA to ensure that systems are engineered such that they are observable and auditable across all relevant dimensions.
- Ensure adequate backup is taken as per the requirements of the disaster recovery SLAs
- Ensure that restoration drills are performed at regular intervals to check preparedness for disaster recovery as well as the quality of backup
- Ensure the company’s data and databases are well protected
- Maintain Deployment architecture and infrastructure documentation
Key Skills And Experience
At a minimum, candidates must have and experience and understanding with:
- Software development at different stages of product development, particularly early-stage prototypes, to minimum viable products
- Strong experience in infrastructure as code (IaC), software development, and continuous integration
- Proficiency in system administration and Linux administration
- CI/CD tools such as GitHub Actions, Jenkins, CircleCI, AWS CodeDeploy, AWS CodeBuild etc
- Experience with Relational and NO-SQL databases
- Should be good with Terraform, Terragrunt, Ansible, Microservices architectures.
- Should have fair idea of Database Administration with MySQL, MondgoDB
- Understanding of networking concepts and protocols
- Knowledge of containerization technologies such as Docker and Kubernetes
- Experience with cloud platforms such as AWS, Digital Ocean, Utho, Azure, or GCP
- Ability to troubleshoot and resolve issues in a timely manner
- Excellent communication and collaboration skills
- Relevant cloud certifications (e.g., AWS Certified DevOps Engineer, Certified Kubernetes Administrator) are a plus
Candidates will preferably also have
- Extensive experience with data-intensive systems
- Experience working in start-up or R&D environments
Skills: infrastructure,aws,cloud,devops,ci,cd,data,databases,architecture