Developer - Cloud SRE & DevOps

Airbus • Full-time • Bengaluru, IN • 1w ago

Site Reliability Engineer (SRE) - Airline Sciences (DMA)

Designation: Developer - Cloud SRE & DevOps

Key Responsibilities:

● Design, implement, and maintain robust and scalable infrastructure on AWS to support our microservices-based applications and REST APIs.

● Work collaboratively with DevOps Engineer’s CI/CD pipelines for automated deployment, testing, and rollback of services.

● Monitor system performance, availability, and reliability using APM tools, with a preference for Kibana-based solutions, and establish effective alerting mechanisms.

● Proactively identify potential issues and bottlenecks through log analysis, performance metrics, and synthetic monitoring; implement preventative measures.

● Troubleshoot and resolve complex production incidents, performing root cause analysis (RCA) and implementing long-term solutions.

● Manage and optimize database performance, reliability, and scalability.

● Configure and maintain network infrastructure, including load balancers, firewalls, and proxies, ensuring secure and efficient traffic flow.

● Champion and implement infrastructure-as-code (IaC) practices.

● Work with Docker for containerization of applications, managing container orchestration and registries.

● Collaborate closely with development teams to define service level objectives (SLOs), service level indicators (SLIs), and error budgets.

● Develop and maintain comprehensive documentation for system architecture, configurations, and operational procedures.

● Drive automation initiatives to reduce manual effort and improve system resilience. ● Contribute to capacity planning and performance tuning efforts.

Required Qualifications:

● Bachelor's degree in Computer Science, Engineering, or a related technical field.

● 4-6 years of experience in Site Reliability Engineering, DevOps, or a similar role.

● Proven hands-on experience with Amazon Web Services (AWS), including services like EC2, S3, RDS, VPC, IAM (Identity and Access Management), Lambda, and EKS/ECS.

● Strong understanding and practical experience with microservices architecture and REST API principles.

● Proficiency in managing and troubleshooting relational and NoSQL databases (e.g., PostgreSQL, MySQL, MongoDB, Cassandra).

● Solid knowledge of networking concepts (TCP/IP, DNS, HTTP/S, VPNs) and experience with proxies (e.g., Nginx, HAProxy).

● Demonstrable experience with monitoring, logging, and alerting systems, with a strong preference for experience with the ELK Stack (Elasticsearch, Logstash, Kibana) for APM and observability.

● Hands-on experience with Docker containerization and orchestration (e.g., Kubernetes, Docker Swarm).

● Capable in at least one scripting language (e.g., Python, Bash, Go).

● Experience with CI/CD tools (e.g., Jenkins, GitLab CI, AWS CodePipeline).

● Strong analytical and problem-solving skills with a proactive approach to identifying and resolving issues.

● Excellent communication and collaboration skills.