Join Zendesk as an Engineering Manager specializing in Observability, where you will lead a skilled team to enhance system reliability and monitoring in a cloud-native environment. Your expertise will empower engineering teams and drive proactive reliability engineering at scale.
Key Responsibilities
Recruit, mentor, and retain top engineering talent specialized in observability and reliability engineering
Contribute to the design and implementation of observability solutions
Own and evolve the end-to-end observability stack and operational processes
Partner with SRE, DevOps, and platform teams to integrate observability tooling
Lead roadmap planning for observability infrastructure and tooling
Establish best practices for instrumentation, data collection, alerting thresholds, and incident response workflows
Identify gaps and weaknesses in monitoring coverage and performance
Collaborate cross-functionally to influence observability adoption and innovation
Foster a culture of continuous learning and technical craftsmanship within the team
Communicate technical strategy, progress, risks, and impact effectively
Required Qualifications
3+ years of people management experience leading engineering teams
Deep domain expertise in Observability with hands-on experience in tools like Datadog, Grafana, Loki
Significant experience working in or managing engineering teams within large-scale enterprise companies
Proven ability to hire, mentor, and retain high-performing engineers
Strong collaboration skills to influence cross-functional teams in large engineering organizations
Experience with distributed systems and cloud environments (AWS, Kubernetes)
Preferred Qualifications
Background leading Observability focused teams
Hands-on experience operating telemetry systems for large-scale Kubernetes and AWS workloads
Passion for innovation, continuous learning, and championing a growth mindset
Experience managing geographically distributed teams