Description
The Site Reliability Engineer (SRE) bridges the gap between technical customer support and high-level infrastructure management. This role is responsible for supporting a proprietary, Linux-based and AWS-hosted Platform-as-a-Service offering, ensuring both operational excellence and outstanding customer satisfaction. Initially focused on front-line technical product support, the SRE will have opportunities to grow into more advanced positions and responsibilities. The ideal candidate will possess the technical expertise and interpersonal skills necessary to engage directly with end users, resolving complex issues while continuously improving platform reliability and performance.
Site Reliability Engineer focuses on providing front-line technical product support, troubleshooting customer issues, and building foundational expertise in platform operations and automation tools. Site Reliability Engineer II demonstrates the same technical capability while also taking on advanced infrastructure responsibilities, leading infrastructure initiatives, and mentoring junior team members.
Responsibilities:
● Serve as the front-line technical resource for troubleshooting and resolving customer issues related to the Company's Linux-based AWS platform.
● Provide exceptional technical support to internal and external stakeholders, ensuring timely resolution of issues within established SLAs.
● Document and escalate complex issues to senior technical resources as needed while striving to independently resolve more advanced issues over time.
● Monitor and respond to technical incidents, identify root causes, and collaborate with internal teams to implement long-term solutions.
● Write and maintain knowledge base articles and training materials for end users and internal teams.
● Manage and maintain infrastructure via automation tools such as Terraform, Ansible, CloudFormation, and Chef, as responsibilities grow.
● Act as a subject matter expert during client deployment, implementation, and migration projects.
● Collaborate closely with the Product, Quality Assurance, Engineering, and Operations teams to ensure alignment and a seamless user experience.
● Document product use cases, enhancements, and bug fixes; advocate for product improvement based on user feedback.
● Participate in on-call rotations to provide 24/7 operational support.
● Maintain strong relationships with customers and stakeholders, striving for exceptional satisfaction and engagement.
● Become familiar with the secure use and management of the AWS control plane to ensure compliance with security and data privacy standards as expertise develops.
● Participate in the design, deployment, and maintenance of CI/CD pipelines to support seamless application development and deployment.
●Oversee the availability and performance of production and development environments, ensuring alignment with SLAs and industry best practices.
Requirements
3+ years of experience in technical customer support or service desk environments, with a focus on technical product support.
● 5+ years of experience in cloud computing and infrastructure management.
● Strong knowledge of Amazon Web Services (AWS), including containerized applications (EKS, ECS, ECR, Elastic Beanstalk).
● Proficiency in Linux administration, including user management, software installation, and file system management.
● Familiarity with networking concepts and DNS.
● Hands-on experience with CI/CD tools and processes.
● Proficiency with versioning tools (Git, svn).
● Excellent oral and written English communication skills with a customer-centric perspective.
● Strong troubleshooting and critical thinking skills.
● Ability to work both independently and collaboratively in a team environment.
● High attention to detail and organizational skills.
● Proficiency in at least one programming language.
Preferred Qualifications:
● AWS certification (any level).
● 4-year college degree in a technical or quantitative science field, or equivalent work experience.
● Experience supporting end users in a service desk or technical customer support environment.
● Familiarity with virtualized infrastructure management and security best practices.
Benefits
Location : Onsite