DevOps Analyst

Antal International • Full-time • Remote (Bengaluru, IN) • 5d ago

Responsibilities

● Manage availability, latency and performance of mission critical services and build automation to prevent problem recurrence.

● Independently determine and develop architectural approaches and Infrastructure solutions.

● Defining and ensuring adherence for strategy and roadmap to develop CI/CD, Application hosting, Security and Compliance standards and guidelines.

● Manage incident response protocol and provide hands-on direction during service interruptions - Assist with Root Cause Analysis of service interruptions and maintain SLA.

● Manage Teams and guide them to achieve above.

Basic Requirements

● 4 years of Experience handling Linux Systems at large scale.

● 4 years of Hands-on experience on Containers & Container Orchestration Tools.

● 4 years of proven Experience with designing, building, supporting and observing large-scale distributed systems/services/infrastructure

● Strong work ethic, a self-starter and demonstrate a high level of resilience

● Should be highly goal driven and work well in fast-paced, team-oriented environment

● Experience as a Site Reliability Engineer, with a focus on AWS.

● Shell/Ruby/Python scripting knowledge

● Strong written and communication skills

Preferred Qualifications

● Deep rooted understanding of Linux Systems, Databases and Network concepts

● In-depth knowledge of cloud services, including compute, storage, networking, databases, and security.

● Strong proficiency in infrastructure as code (IaC) concepts and tools, such as Terraform/CloudFormation, for automating infrastructure deployment.

● Familiarity with common web/app/db servers like (nginx, postgres etc)

● Experience with queue systems like RabbitMQ/Kafka is a plus.

● Experience with monitoring and logging tools, such as CloudWatch, Cloud Monitoring, and ELK/EFK Stack

● Proficient with Kubernetes internal architecture, networking and container microservice architectural pattern

● Strong Experience in Microservices Architecture, API GW, Service Mesh implementation and instrumenting XaaC (Infrastructure, Software, Network, Policy, Security) across global scale systems

● Hands-on Experience in defining and driving Disaster Recovery across Platforms.

● Ability to turn technical deep-dives into code, networking, operating systems, and storage, with ability to participate in an executive strategy discussion.

● Automation, auditing, and other tooling for security, compliance, and resource usage - Monitor and improve processes for all deployments.

● Have clear understanding server to server interactions and best practices