Job Overview
We are seeking a skilled Cloud Observability Engineer with over 5 years of experience in leveraging Splunk Enterprise and Splunk Cloud. This position is pivotal in driving the evolution of our observability IT platform, ensuring comprehensive real-time monitoring across our infrastructure, applications, microservices, and user interfaces.
Key Responsibilities
- Develop and enhance a data-driven observability platform for real-time monitoring.
- Proactively identify errors, latencies, and performance issues before they affect users and customers.
- Assign ownership of problems and errors in applications and microservices to appropriate teams.
- Implement automated processes to utilize application logs, metrics, and traces for ongoing performance assessment.
- Collaborate with cross-functional teams to pinpoint and resolve issues using Splunk and other monitoring tools.
- Engage with UNIX, Linux, and Windows server administration teams to troubleshoot configuration challenges.
- Provide training and support for users of the Splunk platform and associated monitoring components.
- Utilize collaboration tools to efficiently document and share knowledge of workflows.
Required Skills
- Comprehensive understanding of Cloud Application Performance Monitoring and Observability.
- Proficient in utilizing Splunk Enterprise and Splunk Cloud, with experience in SignalFx and OpenTelemetry.
- Skilled in integrating infrastructure monitoring solutions with cloud environments and container orchestration platforms, such as OpenShift and Kubernetes.
- Ability to write regex for field extractions and formulate complex queries in Splunk.
- Experience with scripting languages, including Python and shell scripting, as well as tools like Ansible.
- Hands-on support experience with both Windows and Linux systems in large enterprise settings.
- Excellent communication and presentation abilities.
Qualifications
- Minimum of 5 years of experience with Splunk Enterprise and Splunk Cloud.
- At least 2 years of experience with SignalFX and OpenTelemetry.
- 3 years of experience in UNIX/Windows Engineering.
- Proficient scripting experience (Python, shell, Ansible) for automation and integration tasks.
- Experience in designing and supporting platforms with high availability and multi-site configurations.
Career Growth Opportunities
Joining our team offers significant opportunities for professional development in a rapidly evolving field. You will be part of a culture that fosters continuous learning and collaboration, where your contributions directly impact our operational excellence and service delivery.
Company Culture And Values
We pride ourselves on our innovative approach and commitment to excellence. Our team is dedicated to creating a diverse and inclusive work environment where every member is valued. We encourage open communication, teamwork, and a passion for solving complex challenges.
Networking And Professional Opportunities
As a member of our team, you will have the chance to engage with industry experts, participate in professional development programs, and expand your network through collaboration on cutting-edge projects.
Employment Type: Full-Time