Overview
We are looking for a skilled
DevOps Engineer to join our growing team. In this role, you will work at the intersection of development, operations, and system administration to build, deploy, and maintain scalable, reliable infrastructure. Your focus will be on automating manual processes, enhancing continuous integration and delivery, and ensuring high availability and performance of production systems.
You will collaborate with engineers across teams to improve the software development lifecycle, reduce friction in deployments, and enhance system reliability. This is a fantastic opportunity to join a fast-paced environment and make a direct impact on the company’s success.
Key Responsibilities
- Infrastructure Management:
- Design, deploy, and manage scalable and resilient infrastructure in cloud environments (AWS, Google Cloud, Azure).
- Use Infrastructure as Code (IaC) tools like Terraform, CloudFormation, or Pulumi to automate infrastructure provisioning and management.
- Maintain and improve cloud-based and on-premise infrastructure for production and development environments.
- CI/CD Pipeline Development:
- Build and manage robust Continuous Integration/Continuous Deployment (CI/CD) pipelines to streamline software deployment.
- Implement automation for testing, building, and deploying applications using tools like Jenkins, GitLab CI, CircleCI, or TravisCI.
- Containerization and Orchestration:
- Design and manage containerized environments using Docker.
- Orchestrate container management with tools like Kubernetes or Docker Swarm to ensure application scalability and high availability.
- Monitoring and Logging:
- Implement monitoring and logging systems using tools like Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), or Datadog to track system performance, troubleshoot issues, and ensure system uptime.
- Set up automated alerts for performance bottlenecks, system failures, or resource utilization spikes.
- Automation & Scripting:
- Write automation scripts in Python, Bash, Shell, or similar languages to automate manual tasks and improve operational efficiency.
- Automate repetitive tasks such as system configurations, application deployments, and patch management.
- Security and Compliance:
- Implement security best practices across infrastructure, CI/CD pipelines, and deployments.
- Work with the security team to ensure applications meet compliance standards and best practices.
- Perform regular security assessments and vulnerability scanning.
- Collaboration & Support:
- Work closely with development teams to ensure smooth code deployment and integration.
- Provide support for production systems, identifying root causes of incidents, and working to prevent future occurrences.
- Participate in on-call rotation to support systems during critical incidents.
Required Skills & Qualifications
- Experience:
- 3+ years of experience as a DevOps Engineer, Site Reliability Engineer (SRE), or in a similar role with a focus on automation and systems management.
- Proven experience managing cloud infrastructure (AWS, Google Cloud, or Azure).
- Strong experience with CI/CD tools like Jenkins, GitLab CI, or CircleCI.
- Solid experience with Docker and container orchestration tools like Kubernetes or OpenShift.
- Technical Skills:
- Strong scripting skills in Python, Bash, or Shell scripting for automation and troubleshooting.
- Experience with Infrastructure as Code (IaC) tools (e.g., Terraform, CloudFormation).
- Experience with configuration management tools like Ansible, Chef, or Puppet.
- Familiarity with monitoring and logging tools such as Prometheus, Grafana, ELK Stack, or Datadog.
- Experience with version control systems like Git.
- Knowledge of Systems:
- Knowledge of Linux/Unix administration and server management.
- Experience with database management, particularly in scaling SQL and NoSQL databases (e.g., PostgreSQL, MySQL, MongoDB).
- Soft Skills:
- Strong problem-solving skills and the ability to troubleshoot complex production issues.
- Excellent communication skills to work effectively with development, operations, and other teams.
- Self-motivated and able to work independently, while being part of a team.
- Ability to manage multiple tasks and prioritize effectively in a fast-paced environment.
Preferred Qualifications
- Certification in cloud technologies (e.g., AWS Certified Solutions Architect, Google Cloud Professional Cloud Architect, Azure Administrator).
- Experience with serverless architectures (e.g., AWS Lambda).
- Familiarity with Agile development methodologies and project management tools like Jira or Confluence.
- Knowledge of infrastructure security practices (e.g., firewalls, VPNs, identity management).