Project Role : DevOps Engineer
Project Role Description : Responsible for building and setting up new development tools and infrastructure utilizing knowledge in continuous integration, delivery, and deployment (CI/CD), Cloud technologies, Container Orchestration and Security. Build and test end-to-end CI/CD pipelines, ensuring that systems are safe against security threats.
Must have skills : DevOps
Good to have skills : Site Reliability Engineering
Minimum 3 year(s) of experience is required
Educational Qualification : 15 years full time education
Summary:
This role will involve managing the production support of software applications and services, implementing automation, monitoring, and alerting tools, and ensuring the reliability, availability, and performance of critical systems and services.
Roles & Responsibilities:
-Incident Management: Lead incident management efforts, including incident response, root cause analysis, and post-incident reviews. Collaborate with cross-functional teams to minimize impact and restore services as quickly as possible. Implement preventive measures to avoid future incidents and drive continuous improvement in incident management processes.
-Site Reliability Engineering: Implement best practices in site reliability engineering, including system monitoring, alerting, capacity planning, performance optimization, and incident management. Collaborate with development teams to ensure application architectures are resilient and scalable, and drive the adoption of DevOps and SRE principles and practices.
-Deployment Management: Implement and manage the deployment process for software applications and services, including Monthly release management of AADL products, change management, and rollback procedures. Drive continuous improvement in deployment processes and tools to increase efficiency and minimize risk.
-Monitoring and Alerting: Implement and maintain effective system monitoring and alerting tools to proactively detect and resolve issues. Define and track key performance indicators (KPIs) and service level objectives (SLOs) to measure system reliability, performance, and availability.
-Collaboration: Collaborate closely with development, operations, security, network and other stakeholders to ensure smooth operations and timely resolution of issues. Foster strong relationships and effective communication channels to promote collaboration and coordination.
-Documentation: Maintain comprehensive documentation of deployment processes, system configurations, procedures, and incident reports. Ensure documentation is up-to-date, accurate, and accessible to relevant stakeholders
Professional & Technical Skills:
- Must To Have Skills: Proficiency in DevOps.
- Good To Have Skills: Experience with Site Reliability Engineering.
- Strong understanding of continuous integration, delivery, and deployment (CI/CD) principles.
- Experience with cloud technologies such as AWS, Azure, or Google Cloud Platform.
- Knowledge of container orchestration tools like Kubernetes or Docker Swarm.
- Familiarity with infrastructure as code tools such as Terraform or CloudFormation.
- Experience with configuration management tools like Ansible or Puppet.
- Solid understanding of networking and security principles in a cloud environment.• In-depth knowledge of relational database management systems (RDBMS) such as Oracle, Microsoft SQL Server, MySQL, or PostgreSQL.
• Knowledge of cloud computing platforms, preferably AWS is a plus.
• Relevant certifications, such as AWS Certified DevOps Engineer, Kubernetes Certified Administrator, or Site Reliability Engineering (SRE) certifications, Grafana expertise are desirable.
• Strong technical skills in deployment processes and tools, such as release management, change management, and rollback procedures.
• Proficient in scripting and automation using tools like Python, Bash, or PowerShell.
• Solid understanding of DevOps principles, Agile methodologies, and ITIL practices.
• Strong technical skills in CI/CD tools and practices, such as Jenkins, Git, Docker, Kubernetes, and related technologies.
• Strong leadership skills with experience in managing and mentoring technical teams.
• Excellent problem-solving, analytical, and communication skills.
• Ability to work independently, prioritize tasks, and manage time effectively.
• Experience with incident management tools and processes, such as ITIL Incident Management, and familiarity with ITSM frameworks.
Additional Information:
- The candidate should have a minimum of 3 years of experience in DevOps.
- This position is based at our Gurugram office.
- A 15 years full time education is required.