Everbridge: Site Reliability Engineer
25 Corporate Dr., 4th Fl
Burlington, MA 01803

Employee Testimonials

Photos

Video

Description

About the Position:

Are you motivated by an incredible sense of purpose in doing work that helps keep people safe and business running daily, with results that regularly make headlines? Are you passionate about innovating on the industry’s cutting edge to develop solid architecture principles, operability guidelines, progressive scaling methodologies, and other sophisticated techniques to reliably operate critical technology infrastructure at scale? Do you have an insatiable appetite for streamlining out inefficiency, automating away toil, and proactively eliminating problems before they occur in the first place? If so, this position is a perfect opportunity for you to join the Everbridge Site Reliability Engineering team in a hands-on role driving the design, implementation, and operation of our global platforms.

About the Team
As the Everbridge Site Reliability Engineering team, we are responsible for ensuring overall service quality and availability of Everbridge's solutions. The technology platforms that we support automate the international delivery of critical information to help keep people safe and businesses running.

We are a 24x7x365 distributed team that can do our job anytime, anywhere on the planet with an Internet connection. Our holistic understanding of OSI layers 0 through 8 allows us to effectively maintain a heterogeneous blend of worldwide public and private cloud services where lives and livelihoods are at stake in the event of failures. We are dedicated, passionate people who are committed to internal/external customer service and doing the right thing.

 

Job Duties:

  • Keep people safe and businesses running.
  • Own operational availability, security, scalability, efficiency, monitoring, instrumentation, and overall service reliability of Everbridge's solutions.
  • Collaborate across Agile teams with Architects, Developers, Quality, Data, Security, and other Operations engineers on designing and implementing highly reliable solutions.
  • Embrace Site Reliability Engineering principles of proactivity, automation, cross-functional collaboration, data-driven decision making, and fast+safe failing to continually improve our technology and culture.
  • Enhance our infrastructure, tooling, and processes to extend operability as a self-service function for other groups in the engineering value stream.
  • Participate in a rotating on-call schedule to troubleshoot and resolve production escalations from our 24x7x365 NOC.
  • Have fun while we work hard to make a difference.

 

Minimum Qualifications:

  • Previous experience contributing in a production Site Reliability, DevOps, SaaS/Technical Operations, or NOC environment
  • Dedicated commitment to technical excellence and quality customer service
  • Ability to write code in at least one programming language (e.g. Python, Perl, Java, Ruby, Go)
  • Comfort using Git for practical configuration data and code management
  • Expertise with cloud compute IaaS/abstracted PaaS solutions (AWS Solutions Architect or equivalent) and hybrid/on-premises private compute environments (VMware Certified Professional or equivalent)
  • Deep knowledge in one of these disciplines forms the central pillar of your T-shaped skill set:
  • Network architecture and operation with an emphasis on: application load balancing at local and global scales (ALB/ELB/Route 53), IPv4 routing and dynamic routing protocols (OSPF, BGP), VPN, and network security best practices
  • Automation framework orchestration, configuration management, and software-defined infrastructure management techniques (SaltStack preferred, others e.g. Puppet, Chef, Ansible, etc. also acceptable)
  • Large scale production UNIX/Linux operating system, application, and security maintenance in an online service provider environment (Ubuntu and Debian GNU/Linux preferred)
  • US Citizenship and ability to pass a Federal drug screening

 

Preferred Qualifications:

  • Infrastructure/application monitoring and alerting solutions (Datadog, Elastic BELK/X-Pack, Prometheus, Nagios, Cacti, Graphite/Grafana, InfluxDB, OpenTSDB, Splunk, Graylog, etc.)
  • Application virtualization, containerization, and service-oriented-architecture technologies (Nomad & rest of HashiCorp suite, Docker, Kubernetes, Mesos, CoreOS/rkt)
  • Email transport software and deliverability management concepts (Postfix/Sendmail and derivative commercial MTAs, SPF, DomainKeys/DKIM, DMARC, IP reputation)
  • VoIP (FreeSWITCH or Asterisk w/ SIP) and/or TDM telephony infrastructure
  • Cisco IOS/NX-OS, Juniper JUNOS, and related hardware device and virtual appliance families (Cisco Catalyst/Nexus/ISR/ASR, Juniper routing/switching/firewall platforms, Brocade Vyatta)
  • RDBMS, NoSQL, and hybrid data tier platforms (MongoDB, Elasticsearch, Postgres, MySQL, Riak, Cassandra, HBase, etc.)
  • SEIM, HIDS/NIDS, and related infrastructure tooling required to maintain positive control over security
  • Practical knowledge of BGP traffic engineering, DDoS mitigation, and active threat defense techniques
  • Continuous integration and deployment/delivery pipelines in a release engineering context
  • Performance measurement and tuning methodology for capacity planning and bottleneck hunting

About Us

Our team makes a difference during the most difficult times and challenging situations. Our people are dedicated to solving problems. Our software was built to save lives. Our unifying mission is to keep people safe and businesses running

Headquartered in the great cities of Boston and Los Angeles, with operations across the world, our team of 750+ dedicated employees support more than 4,200 global customers every day in their most crucial moments. During public safety threats such as active shooter situations, terrorist attacks, or severe weather conditions—as well as during critical business like IT outages or cyber-attacks—customers rely on our SaaS-based platform to quickly and reliably aggregate and assess threat data, locate employees and first responders, automate a pre-defined communications processes, and track progress on those response plans.

Our culture is all about “Making a Difference,” and we are proud to serve:

  • 9 of the 10 largest U.S. cities
  • 8 of the 10 largest U.S.-based investment banks
  • 7 of the top 10 U.S. technology and telecom companies
  • 25 of the 25 busiest North American airports
  • 7 of the 10 largest U.S. healthcare systems
  • 6 of the 10 largest U.S. retailers

As we continue to grow and transform the field of critical event management, we need passionate, committed individuals to help us carry out our mission.  Click here to learn more about what we do. If you think you have what it takes to make a difference, apply to be a part of our award-winning team.

Everbridge is an Equal Opportunity/Affirmative Action Employer. All qualified Applicants will receive consideration for employment without regard to race, creed, color, religion, or sex including sexual orientation and gender identity, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law.

 

Full-time

Employee Testimonials

Shane Garoutte
GM + VP, Tech Ops at Everbridge

"Tech companies that save lives are rare. A couple weeks ago, we learned that messages sent through Everbridge helped save a child. On the intrinsic value scale, that’s hard to top. That’s what I was looking for, and what many people who come to Everbridge are looking for. I wanted to be able to tell my kids I’m doing something that makes a difference." Read more.

Shaili Kapoor
Software Engineer

"If someone is looking to join Everbridge, I think the most important thing to know is that you need to be a team player and to take initiative. If you want to work somewhere that’s really collaborative, without any office politics, then I think this is the perfect place. People are really approachable. There are no egos getting in the way." Read more.

Ben Potter
Implementation Specialist

"The military is an organization of comradery and brotherhood, and that comes with a lot of accountability. Similarly, at work it’s important to understand your role in the bigger picture and how your success is measured — then you have to hold yourself accountable to that." Read more.

Kerry McDonough
Implementation Specialist

"There’s this positive energy, this excitement, in every room I walk into. Everyone is enthusiastic and generous, not only with each other, but with customers. I’ve worked places where, when people talk with a customer, their main goal is to finish the call. Here, people want to talk with customers. I think it goes along with our work culture. Everyone’s excited to be at a young, rapidly growing company, so everyone really goes above and beyond. People truly care about what they’re doing." Read more.