Acquia: Manager, Engineering
Acquia is the open source digital experience company. We provide the world's most ambitious brands with technology (built around Drupal) that allows them to embrace innovation and create customer moments that matter. At Acquia we believe in the power of community and collaboration - giving our customers the freedom to build tomorrow on their terms.
We are Acquia. We are building for the future of the web, and we want you to be a part of it. Headquartered in the US, we have been named as one of North Americas fastest growing software companies as reported by Deloitte and Inc. Magazine, and have been rated a leader by the analyst community and named one of the Best Places to Work by the Boston Business Journal.

Our architecture group is looking for a Principal Site Reliability Engineer (a.k.a. Site Reliability Architect) to create frameworks that combine engineering and application development to drive operational stability. You will work with engineering and platform teams that develop and support some of the latest technologies focused on the public cloud and containers.

About you...

As an experienced SRE professional, you work with the core teams combining software practices and engineering to strengthen the application/system reliability along with operational support. Your hands-on knowledge in system design, application development, testing, and operational stability helps transform the way the teams are operating to ensure they deliver high-quality products. You enjoy being instrumental in establishing best practices and tooling to automate operational processes.

About the SRA role...

  • This role can be filled in our Boston, MA or Portland, OR office
  • Architect a new common framework to establish an SRE Model across multiple teams
  • Develop new processes to prevent problem recurrence; automating response to all non-exceptional service conditions
  • Enhance SLO trending and centralized reporting
  • Identify opportunities to improve architecture/engineering practices
  • Mentor staff to replace manual processes with automation
  • Coach teams to enhance incident response handling
  • Collaborate across all level of the organization to drive the SRE model

The ideal candidate has...

  • Bachelor's degree in one of the following: Management Information Systems, Computer Science, Software Engineering, Technology, and/or other related fields of study
  • 5+ years of experience as a Site Reliability Engineer
  • Ability to apply a systematic approach to solve problems with a sense of ownership and focus
  • Effective communication skills with the ability to articulate technical details to different, sometimes non-technical audiences
  • Expertise in designing, analyzing and troubleshooting large-scale distributed systems
  • Advanced experience in supporting enterprise container based platforms
  • Experience in cloud technologies such as architecting, developing or maintaining cloud solutions in public cloud environments (AWS/GCP)
  • CI/CD - Deployment pipeline experience (Jenkins, Ansible)
  • Familiarity with REST API design
  • Devops container/orchestration tools (Kubernetes, Docker, Puppet, etc)
  • AWS Deep knowledge
  • Good knowledge of Python, GO, or similar scripting languages
  • Experience with Configuration Management systems
  • Knowledge of Unix/Linux based systems, and experience troubleshooting applications running on these systems
  • Experience with software lifecycle including design, implementation, and delivery
  • Agile environment experience

Acquia is an equal opportunity (EEO) employer. We hire without regard to age, color, disability, gender (including gender identity), marital status, national origin, race, religion, sex, sexual orientation, veteran status, or any other status protected by applicable law.