Klaviyo
Lead Site Reliability Engineer – Observability
Job Description
At Klaviyo, we value the unique backgrounds, experiences and perspectives each Klaviyo (we call ourselves Klaviyos) brings to our workplace each and every day. We believe everyone deserves a fair shot at success and appreciate the experiences each person brings beyond the traditional job requirements. If you’re a close but not exact match with the description, we hope you’ll still consider applying. Want to learn more about life at Klaviyo? Visit careers.klaviyo.com to see how we empower creators to own their own destiny.
Lead Site Reliability Engineering (SRE) is what you get when you treat system operations as a software engineering problem. The mission of the Site Reliability Engineering team is to ensure uninterrupted service for Klaviyo customers and act as a force multiplier for Klaviyo product teams to deliver better software faster. The SRE team builds foundational backend services as well as tooling and automation to allow product teams to release and scale their software reliably and predictably. Lead SREs are team players who embed themselves within product teams as needed to advance the architecture and performance of software systems and train their peers in topics such as debugging distributed systems, building self-healing applications and eking out every drop of performance possible. As a Lead Site Reliability Engineer, you will own the ways we solve problems for our customers, and make a big impact on the productivity of our product engineering teams. Klaviyo is growing fast and we have openings for all skill levels across all of our teams. Learn more about our engineering culture at https://klaviyo.tech
How You'll Make a Difference
- Ship foundational services to enable Klaviyo engineering to move faster with confidence
- Design and develop systems and processes that enable highly available & scalable systems
- Uncover and advocate for preventative, upstream solutions with internal stakeholders
- Own the technical vision and roadmap for your area, working with stakeholders to solve pain points and deliver value to engineering
- Achieve break-throughs in systems throughput by identifying and eliminating bottlenecks
- Leverage technology such as Python, AWS, Django, Kubernetes, Bash, Terraform, MySQL, Redis, Postgresql to advance Klaviyo’s platform
- Champion best practices by actively collaborating with other teams in a culture that values whiteboarding and technical design review
- Contribute to the company in multiple areas, constantly pushing yourself to be a better engineer and to level up all of your peers within your team and within Klaviyo.
- Design, write and deliver software to dramatically improve the availability, scalability, latency, and efficiency of Klaviyo’s services
- Participate in periodic on call duties with a focus on solving issues when they are discovered, preventing recurrences and minimizing alert fatigue
- Implement architectural improvements to achieve breakthrough results in Klaviyo systems’ operational scalability and reliability.
- Work hand-in-hand with product-facing engineers and other SREs to ship impactful code
- Perform quantitative analysis to understand and scale Klaviyo systems
- Evangelize Site Reliability best practices across the engineering organization
Who You Are
- Solid 10+ years of experience in the SRE/Devops field
- BA or BS Degree in Computer Science, related field, or equivalent experience
- Ability to handle yourself in outage situations and to drive failures to root cause analysis and prevention of future issues
- Understanding of Linux (we run Ubuntu) and all layers of the networking stack
- Experience working on an engineering team building software
- Experience writing code using best practices in a language such as Python, Ruby, Go, etc.
Massachusetts Applicants:
It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability.
Our salary range reflects the cost of labor across various U.S. geographic markets. The range displayed below reflects the minimum and maximum target salaries for the position across all our US locations. The base salary offered for this position is determined by several factors, including the applicant’s job-related skills, relevant experience, education or training, and work location.
In addition to base salary, our total compensation package may include participation in the company’s annual cash bonus plan, variable compensation (OTE) for sales and customer success roles, equity, sign-on payments, and a comprehensive range of health, welfare, and wellbeing benefits based on eligibility. Please visit Klaviyo Rewards to find out more about our Total Rewards package.
Your recruiter can provide more details about the specific salary/OTE range for your preferred location during the hiring process.
Get to Know Klaviyo
We’re Klaviyo (pronounced clay-vee-oh). We empower creators to own their destiny by making first-party data accessible and actionable like never before. We see limitless potential for the technology we’re developing to nurture personalized experiences in ecommerce and beyond. To reach our goals, we need our own crew of remarkable creators—ambitious and collaborative teammates who stay focused on our north star: delighting our customers. If you’re ready to do the best work of your career, where you’ll be welcomed as your whole self from day one and supported with generous benefits, we hope you’ll join us.
Klaviyo is committed to a policy of equal opportunity and non-discrimination. We do not discriminate on the basis of race, ethnicity, citizenship, national origin, color, religion or religious creed, age, sex (including pregnancy), gender identity, sexual orientation, physical or mental disability, veteran or active military status, marital status, criminal record, genetics, retaliation, sexual harassment or any other characteristic protected by applicable law.