$175,000 - $220,000 / year + Bonus

The insurance industry runs on Vertafore. We equip agencies, MGAs, and carriers with the core digital systems, specialized AI, and data-driven foundation to eliminate distribution drag across the insurance lifecycle, spanning sales, servicing, and back-office operations.

Underpinned by unmatched speed and performance power, we are the trusted backbone that’s taking the insurance industry from friction to flow with Distribution Velocity – speed, performance, and trust - to drive growth at scale.

With over 95% of the top agencies and insurers and 50% of industry compliance transactions running through Vertafore, we lead at the intersection of innovation and trust, giving insurance professionals the confidence to transform and win in the AI era.

Our reach is global, with headquarters in Denver, Colorado, and offices across the U.S., Canada, and India.

The Director, Site Reliability Engineering (SRE) will lead reliability, performance, and observability initiatives for a portfolio of Vertafore products. This role owns SLIs/SLOs, incident response, automation, and CI/CD practices for assigned product families. Directors will manage multiple teams and collaborate with Product Development, Architecture, Cloud Operations, Information Security, and other SRE leaders to ensure operational excellence. This role is responsible for bridging the gap between development and operations by applying a software engineering mindset to system administration. You will own the lifecycle of services - from inception and design, through deployment, operation, and refinement.

Key Responsibilities

• Product Reliability Leadership

o Define and enforce SLIs/SLOs for a subset of Vertafore flagship products.

o Drive observability strategy across application and infrastructure layers.

• Release Engineering & Toil Reduction

o Oversee CI/CD pipelines for product deployments using tools like GitLab, Jenkins, Ansible, LaunchDarkly.

o Monitor and cap "Toil" (manual, repetitive operational work) at 50% using Automation and AI tools, ensuring the team spends the remaining time on project work that scales the system.

• Error Budget Management

o Manage "Error Budgets" to balance the velocity of feature releases with the stability of the platform, ensuring clear consequences when budgets are exhausted.

• Incident Management

o Define and participate in 24x7 on-call rotations for assigned products; ensure rapid resolution and blameless postmortems.

• Cross-Functional Collaboration

o Partner with Cloud Ops on capacity planning, OS patching (app tier), and load balancing (ALB, F5).

o Align reliability goals with product roadmaps and customer SLAs.

• Team Leadership

o Manage a group of Managers and Engineers, mentor teams on automation, observability, and reliability best practices.

Director, Site Reliability Engineering

About the Role

Skills

Full job details