Responsibilities
- Serve as an advocate for quality practices including the development of automated testing to improve business processes
- Act as a critical part of a multi-team effort to deliver, manage and maintain configuration automation to meet business needs.
- Create and maintain configuration standards for software and infrastructure.
- Manage CI & CD tools and pipelines as a partner to development and QA teams.
- Develop and socialize operational standards for teams throughout engineering.
- Recommend, develop and implement system enhancements that will improve the performance and reliability of the system including installing, upgrading/patching, monitoring, problem resolution, configuration management and security.
- Oversight of critical incident and major system escalations from initiation to resolution.
- Create mechanisms/architectures that enable fault tolerance and rapid recovery from failure.
- Participate in a rotating on-call escalation service.
- Create and maintain configuration standards for software and infrastructure.
- Capacity Planning and Chaos Engineering.
- Strong communication skills, verbal and written.
Qualifications
- Bachelor’s degree in a technical field, or equivalent experience
- 1 to 3 years’ experience in an operational environment, preferred
Technical Requirements
- Experience with Linux Operating Systems in a production and development environments
- Experience in network and server engineering
- Experience with automation/configuration management such as Ansible, Chef, Puppet or equivalent
- Experience with workflow data pipeline management services such as Airflow and/or Luigi
- Expertise on the latest Cloud compute, load balancing and scaling, storage, networking, security, and virtualization technologies with Cloud providers such as GCP (preferred), AWS and/or Azure.
- Demonstrated experience installing, operating, and troubleshooting a variety of open- source technologies
- Experience with relational and non-relational databases
- Practical experience developing software or meeting operational needs with code and scripting (Bash, Python, Perl, Ruby, and/or Java)
- Experience with software quality principles and associated tools for testing and analysis.
- Knowledge of CI & CD practices and supporting tools (Jenkins, Bamboo, or similar)
- Experience with IaC Technologies such as Terraform, CloudFormation or Pulumi
- Experience with PaaS technologies such as containers, container orchestration and scheduling, service registration / discovery and monitoring (Docker, Kubernetes, etc.)
- Load, scalability, systems, or performance testing experience
- Observability & Monitoring expertise to dissect data to get to the root cause of system and infrastructure issues.
Minimum Qualifications:
• Bachelor's degree in Engineering, Information Systems, Computer Science, or related field.