At new healthcare company, we enable better, smarter, safer healthcare to improve lives. As a new company with a long legacy of creating breakthrough solutions for our customers’ toughest challenges, we pioneer game-changing innovations at the intersection of health, material and data science that change patients' lives for the better while enabling healthcare professionals to perform at their best. Because people, and their wellbeing, are at the heart of every scientific advancement we pursue.
We partner closely with the brightest minds in healthcare to ensure that every solution we create melds the latest technology with compassion and empathy. Because at here, we never stop solving for you.
The Impact You Will Make in this Role
As a DevOps Engineer / Site Reliability Engineer (SRE), you will have the opportunity to collaborate with dynamic teams to ensure the reliability, scalability, and performance of our infrastructure and applications. In this role, you will make an impact by:
Designing and implementing CI/CD pipelines to automate software deployment and infrastructure management processes, improving development speed and reliability.
Managing cloud infrastructure (AWS, Azure, Google Cloud) to ensure optimal performance, cost-efficiency, and scalability.
Building and maintaining monitoring, logging, and alerting systems to proactively identify and resolve system issues, ensuring high availability.
Collaborating with development teams to optimize system performance and automate operational workflows using Infrastructure as Code (IaC) tools such as Terraform, Ansible, or CloudFormation.
Ensuring systems' security and compliance by applying best practices in patch management, vulnerability scanning, and incident response.
Improving system reliability and uptime through disaster recovery planning, automation of system deployments, and system health checks.
Performing root cause analysis (RCA) for system outages or performance issues, and implementing preventive measures for future incidents.
Providing technical guidance to team members and stakeholders on best practices for scaling, automation, and reliability.
Skills and Expertise
To set you up for success in this role from day one, we require (at a minimum) the following qualifications:
Bachelor’s Degree or higher in Computer Science, Engineering, or a related field (or equivalent experience).
3+ years of experience in a DevOps or Site Reliability Engineering role, ideally in a cloud environment.
Strong proficiency with cloud platforms (AWS, Azure, Google Cloud) and containerization tools like Docker and Kubernetes.
Extensive hands-on experience with Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or Ansible.
Familiarity with CI/CD tools (Jenkins, GitLab CI, CircleCI) and version control systems such as Git.
Experience with monitoring and logging tools (Prometheus, Grafana, ELK stack, Splunk, etc.).
Experience in scripting languages (Python, Bash, Go).
Strong understanding of networking, load balancing, and security protocols.
Additional qualifications that could help you succeed even further in this role include:
Familiarity with Agile methodologies and working in an Agile environment.
Knowledge of DevSecOps practices and security automation.
Experience with automated testing for infrastructure and system reliability.
Problem-solving skills for troubleshooting complex production issues under pressure.
Understanding of cost optimization practices in cloud environments to reduce overall infrastructure expense
As a DevOps Engineer / Site Reliability Engineer (SRE), you will have the opportunity to collaborate with dynamic teams to ensure the reliability, scalability, and performance of our infrastructure and applications. In this role, you will make an impact by:
Designing and implementing CI/CD pipelines to automate software deployment and infrastructure management processes, improving development speed and reliability.
Managing cloud infrastructure (AWS, Azure, Google Cloud) to ensure optimal performance, cost-efficiency, and scalability.
Building and maintaining monitoring, logging, and alerting systems to proactively identify and resolve system issues, ensuring high availability.
Collaborating with development teams to optimize system performance and automate operational workflows using Infrastructure as Code (IaC) tools such as Terraform, Ansible, or CloudFormation.
Ensuring systems' security and compliance by applying best practices in patch management, vulnerability scanning, and incident response.
Improving system reliability and uptime through disaster recovery planning, automation of system deployments, and system health checks.
Performing root cause analysis (RCA) for system outages or performance issues, and implementing preventive measures for future incidents.
Providing technical guidance to team members and stakeholders on best practices for scaling, automation, and reliability.
Seniority level
Mid-Senior level
Employment type
Full-time
Job function
Information Technology and Engineering
Industries
Medical Equipment Manufacturing
Referrals increase your chances of interviewing at Talent500 by 2x