Cloud Engineer – Observability & Monitoring (Azure, Splunk, AKS)
Location: Hybrid – Plano, TX
Salary Range: $100,000 – $120,000
Citizenship Requirements: Green Card Holders and U.S. Citizens only (no sponsorship available)
Overview:
We are seeking a highly skilled Cloud Engineer with deep expertise in observability, monitoring, and performance optimization within Azure environments. This role will design, implement, and evolve enterprise-grade monitoring solutions for distributed microservices running on Azure Kubernetes Service (AKS). Leveraging Splunk, Istio, and other observability tools, the Cloud Engineer will ensure the performance, scalability, and reliability of mission-critical applications.
The ideal candidate has a strong background in DevOps, Site Reliability Engineering (SRE), and cloud operations, with a proven ability to design resilient monitoring strategies, automate incident response, and drive operational excellence across teams.
Key Responsibilities:
- Design, implement, and maintain observability solutions for distributed microservices on AKS
- Build Splunk dashboards, alerts, and advanced analytics for application and infrastructure monitoring
- Integrate Istio service mesh telemetry, tracing, and logging into observability frameworks
- Apply Twistlock (Prisma Cloud) policies for container and workload security monitoring
- Monitor and optimize Azure-native services including APIM, Cosmos DB, SQL Server, and Networking
- Implement observability as code using Terraform for scalable deployments
- Enhance CI/CD pipelines in Azure DevOps (AzDO) with integrated monitoring and alerting hooks
- Use Azure Chaos Studio to validate system resilience and feed improvements into monitoring strategy
- Support automated API and performance testing using Karate Labs in conjunction with observability tools
- Define and track SLAs, SLOs, and SLIs in partnership with development, security, and operations teams
- Participate in incident response, root cause analysis, and continuous improvement initiatives
Required Qualifications:
- Hands-on experience in DevOps, SRE, or Cloud Operations roles
- Strong proficiency with Splunk in microservices-based environments
- Expertise in Azure Kubernetes Service (AKS) and Istio
- Strong knowledge of Azure services and architecture
- Experience implementing Twistlock (Prisma Cloud), Terraform, and Azure DevOps (AzDO) pipelines
- Familiarity with Azure Chaos Studio and Karate Labs
- Strong scripting and automation skills (e.g., PowerShell, Python, or Bash)
Preferred Qualifications:
- Azure certifications (AZ-400, AZ-104, AZ-305)
- Experience with Prometheus, Grafana, or OpenTelemetry
- Knowledge of DevSecOps practices and secure CI/CD pipeline design