Now, more than ever, the Toast team is committed to our customers. Were taking steps to help restaurants navigate these unprecedented times with technology, resources, and community. Our focus is on building the restaurant platform that helps restaurants adapt, take control, and get back to what they do best: building the businesses they love. And because our technology is purpose-built for restaurants, by restaurant people, restaurants can trust that well deliver on their needs for today while investing in experiences that will power their restaurant of the future.
At Toast, our Site Reliability Engineers (SREs) are responsible for keeping all customer-facing services and other Toast production systems running smoothly. SREs are a blend of pragmatic operators and software craftspeople who apply sound software engineering principles, operational discipline, and mature automation to our environments and our codebase. Our decisions are based on instrumentation and continuous observability as well as through predictions and capacity planning.
About this roll* (Responsibilities)
- Automate collection and analysis of metrics from distributed systems to assist in performance tuning and fault finding
- Create sustainable systems and services through automation, triage & feedback
- Build strong partnerships with development teams to improve services through rigorous testing and release procedures
- Lead system design consulting, platform management, and capacity planning
- Balance feature development speed and reliability with well-defined service level objectives
- Lead sustainable incident response and blameless postmortems
Do you have the right ingredients*? (Requirements)
- Polyglot technologist/generalist with a thirst for learning
- Deep understanding of cloud and microservice architecture, and the JVM
- Experience with tools such as APM, Terraform, Ansible, GitHub, Jenkins, Docker
- Experience developing software or software projects ideally utilizing Java
- Extensive and broad industry experience with at least 8 years of engineering experience and a recent in-depth focus on SRE and/or DevOps roles
- Effective leadership & communication skills to be able to provide technical leadership on large scale projects
*Bread puns encouraged but not required
As part of our commitment to the health and safety of our employees and their families, all individuals entering our US workspaces are required to provide proof of full vaccination against COVID-19 unless they have an approved medical or religious accommodation.