Job Description
Role : DevOps Engineer
We are seeking a highly skilled DevOps Engineer to join our dynamic team.
The ideal candidate will have strong expertise in automation, cloud infrastructure (with a focus on AWS and GenAI services), CI/CD, and containerization, along with a deep understanding of security best practices, monitoring, and system optimization.
This role requires a balance of technical proficiency, problem-solving, and collaboration skills to ensure smooth deployment, scalability, and reliability of applications and infrastructure.
Key Responsibilities
- Design, automate, and manage scalable, secure, and high-availability cloud infrastructure.
- Implement Infrastructure as Code (IaC) using tools like Terraform or CloudFormation.
- Develop and maintain CI/CD pipelines with Jenkins, GitLab CI, CircleCI, or AWS CodePipeline.
- Automate routine tasks using Python and shell scripting.
- Monitor and optimize system performance using Prometheus, Grafana, ELK stack, or AWS CloudWatch.
- Manage databases (MySQL, PostgreSQL, MongoDB, DynamoDB), including backup, recovery, and performance tuning.
- Deploy and manage web applications on production environments with Nginx, Apache, or similar servers.
- Ensure cloud, networking, and server security using IAM, VPC, security groups, and firewalls.
- Manage source control and team collaboration using Git and branching strategies.
- Work with containerization and orchestration technologies (Docker, Kubernetes, ECS).
- Implement disaster recovery, backup, and high-availability strategies.
- Troubleshoot incidents, perform root cause analysis, and implement preventive measures.
- Collaborate with cross-functional teams, ensuring effective communication and documentation.
Required Skills & Experience : & Infrastructure as Code (IaC) :
- Hands-on experience with Terraform, AWS CloudFormation, or similar.
- Proficient in automating infrastructure deployment and management tasks.
- Knowledge of configuration management tools (Ansible, Chef, Puppet).
Monitoring & Logging
- Experience with monitoring tools (Prometheus, Grafana, ELK Stack, AWS CloudWatch).
- Ability to set up alerts, dashboards, and audit logs for system health and performance.
Cloud Platforms (AWS Must Have GenAI Services Experience)
- Strong knowledge of AWS services : EC2, S3, RDS, Lambda, Bedrock, OpenSearch, Knowledgebase, IAM, VPC, CodeDeploy, CodePipeline, SQS, etc.
- Familiar with cloud-native architectures and multi-cloud environments (a plus).
Scripting & Automation
- Python : Scripting, automation, Boto3 for AWS, Flask/Django familiarity (bonus).
- Shell scripting : Strong skills in bash or similar for deployment and system automation.
Database Management
- Experience with MySQL, PostgreSQL, MongoDB, DynamoDB.
- Backup, recovery, performance tuning, and database security best practices.
Web Application Deployment & Server Management
- Experience with production deployments, web/application servers (Nginx, Apache).
- Knowledge of reverse proxies, SSL/TLS setup, and security hardening.
Security & Networking
- Cloud security best practices, IAM management, firewalls, and VPC configurations.
- Strong understanding of TCP/IP, DNS, HTTP/HTTPS, and load balancer setups.
CI/CD & Version Control
- Proficient in Git workflows (GitFlow, trunk-based) for multi-team management.
- Experience with CI/CD pipelines (Jenkins, GitLab CI, AWS Code Pipeline).
- Knowledge of containerization (Docker) and orchestration (Kubernetes, ECS).
High Availability & Scaling
- Load balancing strategies (AWS ELB, HAProxy), failover planning.
- Auto-scaling in cloud platforms and performance optimization.
Backup, Recovery & Incident Response
- Implementation of disaster recovery, redundancy strategies, and system resilience.
- Troubleshooting, root cause analysis, and preventive measures.
Collaboration & Project Management
- Strong communication and documentation skills.
- Ability to collaborate across teams and explain technical concepts to non-technical stakeholders.
- Familiarity with Agile methodologies (Scrum, Kanban) and tools (Jira, Trello) is a plus.
Preferred/Optional Skills
- Dockerfile and Docker Compose creation for multi-container applications.
- Serverless architecture with AWS Lambda, SQS, SNS.
- Project management and task prioritization.
(ref:hirist.tech)