Join Zendesk as an Engineering Manager, SRE - Observability, where you'll lead a talented team in architecting and evolving enterprise-grade monitoring systems. Your expertise will drive proactive reliability engineering, enhancing system observability and performance across the organization.
Key Responsibilities
Recruit, mentor, and retain top engineering talent specialized in observability and reliability engineering
Directly contribute to the design and implementation of observability solutions
Own and evolve the end-to-end observability stack and operational processes
Partner with SRE, DevOps, and platform teams to integrate observability tooling
Lead roadmap planning for observability infrastructure and tooling
Establish best practices for instrumentation, data collection, and incident response workflows
Identify gaps and weaknesses in monitoring coverage and performance
Collaborate cross-functionally to influence observability adoption and innovation
Foster a culture of continuous learning and technical craftsmanship
Communicate technical strategy and progress with stakeholders
Required Qualifications
3+ years of people management experience leading engineering teams
Deep domain expertise in Observability with hands-on experience in tools like Datadog, Grafana, Loki
Significant experience working in or managing engineering teams within large-scale enterprise companies
Proven ability to hire, mentor, and retain high-performing engineers
Strong collaboration skills to influence cross-functional teams in large engineering organizations
Experience with distributed systems and cloud environments (AWS, Kubernetes)
Preferred Qualifications
Background leading Observability focused teams
Hands-on experience operating telemetry systems for large-scale Kubernetes and AWS workloads
Passion for innovation, continuous learning, and championing a growth mindset
Experience managing geographically distributed teams