Key Responsibilities:
Ability to drive the major incident to resolution as Incident commander and
involve in the post-mortem process.
Identify gaps within the current monitoring infrastructure and
create/enhance tools to address them.
Perform event root cause analysis using logs, tools, and available
documentation.
Contribute to documentation to improve resolution time.
Assess the impact of the potential client or platform-wide incidents and
escalate to the appropriate teams
Timely escalation of problems that could be potential risks.
or any other standard monitoring tool).
servers)
Personal Characteristics
Strong portfolio and excellent attitude.
Experience with logging/monitoring tools (Kibana, Grafana, Prometheus,
Working understanding of web infrastructure (Load balancers, application
Must be self-confident to work in a Team and to
handle the responsibilities individually as well
Should be a good listener/ Can articulate well /
Good Communication Skills
Ability to work with teams across organizational
boundaries, different cultures and different time
zones in a virtual environment
Delivery oriented and able to work under strict
deadlines.
Key Requirements:
Bachelor’s/Master’s degree in Engineering, Computer Science (or equivalent experience)
3+ Years of experience as Devops Engineer
Terraform and CI/CD configuration, pipelines, and jobs.
Must have strong knowledge of Terraform.
Must have experience in Python.
Kubernetes understanding, CLI, service re-provisioning.
Must have experience creating Helm charts and performing common k8s maintenance tasks.
Minimum 3 Years Of Experience Working With Containerized Applications.
Should have hands on experience with AWS Services
Solid understanding of system performance and monitoring
Key Requirements:
Demanding high availability of service and quality of support service to the customer.
Operating system (Linux) configuration, package management, startup, and troubleshooting
Good communication skills with an ability to relay incident details expeditiously, concisely, and accurately.
Strong knowledge of infrastructure and application development toward operational support.
Highly Motivated Individual, Self-starter, and a quick learner.
Extensive Knowledge of Incident/Change/Problem Management.
Ability to write simple to moderately complex scripts and programs for automation, tools, frameworks, dashboards, alarms. (Python,
TypeScript, Bash)