POSITION SUMMARY
As an SRE engineer you will be responsible for creating standards/framework for applications and services hosted on AWS for resiliency, high availability, multi region strategy, DR strategy. You would also need identify which apps or services not adhering to these standards and plan their compliance. You would be responsible to collaborate with development team to ensure best practices of reliability and resiliency are in place.
RESPONSIBILITIES
- Responsible for the overall system/framework design working with client requirement.
- Set standards for resiliency, high availability, multi-region strategy, DR strategy for application and services on AWS/Cloud
- Know how best to monitor systems and react when things go wrong, constantly writing and rewriting response playbooks to reduce the time to fix any breakdown which may occur
- Collaborate with development teams to optimize application performance & resiliency on AWS platforms
- Able to work independently, manage requirements and engage in design discussions with client
EXPERIENCE AND REQUIRED SKILL SETS
- 5+ years of experience in AWS, CICD and DevOps tools – Docker, Git, Pipelines, Deployment - EKS, ECS
- Strong understanding of Cloud-based architecture & cloud operations
- Strong knowledge of terraform
- Working understanding of Infrastructure and application monitoring platforms – Datadog, Opensearch, ELK Stack etc.
- 5+ years of experience in setting up strategy, process and checks for resiliency in AWS
- Knowledge of Linux, shell scripting, Python is preferred
- Excellent problem-solving skills and attention to detail.
- Ability to work independently as well as collaboratively in a team environment.
EDUCATION
Bachelor’s degree or master’s in computer science, Engineering, Software Engineering or a relevant field.