- Key Responsibilities:
1. Operational Support:
• Act as an operations engineer to identify areas for improvement in documentation, monitoring, end-to-end verification, and troubleshooting.
• Collaborate with existing engineering teams to provide insights and recommendations for optimization
• Individuals who can act as an operations team with the primary focus of identifying where documentation is lacking or overly technical.
2. Process Improvement:
• Analyze and prioritize gaps identified during the operational phase.
• Take ownership of resolving these gaps, ensuring continuous improvements in system reliability and operational efficiency.
• Investigate issues with the service and diagnose the root cause using tooling, metrics and logs
3. Documentation & Training:
• Contribute to creating and updating documentation, following initial guidance and training from the current engineering team.
• Assist in improving documentation standards with feedback from internal teams.
• Identify when critical problems occur and take decisive action to rectify said issues based on existing documentation
• Review and audit existing documentation to identify gaps, inconsistencies, or overly technical content
• Collaborate with cross-functional teams to gather input and help ensure accurate, up-to-date documentation
• Ensure documentation is easy to understand, accurate, and accessible for both technical and non-technical audiences
• Review and contribute to training materials for both internal and external stakeholders
• Contribute to improving the documentation process and content structure.
4. Test Integration with Monitoring: Integrate manual and automated testing with internal monitoring systems to enable real-time feedback and performance tracking. Ensure that monitoring tools capture relevant test data and results.
Required Qualifications:
• Experience with DevOps and Site Reliability (SRE) practices and microservice architectures.
• Familiarity with monitoring and observability tools such as Grafana, Kibana, and Prometheus.
• Strong writing and editing skills with an ability to convey complex technical concepts in a simple and understandable manner
• Ability to work independently and collaboratively with various teams.
• Knowledge of software development methodologies, continuous integration, and deployment (CI/CD) processes.
• Experience with other DevOps or SRE tools like Jenkins, Docker, Kubernetes, or Terraform.
• Familiarity with cloud infrastructure platforms such as AWS, Azure, or GCP.
• Experience writing for both technical and non-technical audiences, including developers, system administrators, and end-users.