CloudZero

CloudOps Engineer

Boston, MA
May 8, 2025
Apply Now
Deadline date:

Job Description

About the Role:
We’re looking for a CloudOps Engineer to join our fast-growing CloudOps team focused on Developer Experience, SRE, and FinOps. In this role, you’ll be responsible for the reliability, performance, and observability of CloudZero’s infrastructure — empowering engineering teams to ship features that help customers understand and optimize their cloud spend.

CloudZero processes billions of events daily across AWS, Azure, and GCP. Our customers rely on real-time, accurate cost data to make business-critical decisions — and any instability in our system impacts their planning. Built entirely on a unique serverless architecture (no EC2s or containers), our platform demands infrastructure that scales gracefully, fails predictably, and recovers automatically.

The problems are interesting: handling massive data volumes efficiently, ensuring sub-second query performance across terabytes of data, and scaling systems to support customers spending millions monthly — all in a modern, event-driven environment.

You Will:

  • Infrastructure as Code everything. Design and maintain Pulumi modules that provision reliable, cost-efficient cloud resources. No clicking through consoles.

  • Build observability into everything. Instrument systems so that failures surface quickly and debugging happens with data, not guesswork. You’ll know about problems before customers do.

  • Automate the boring stuff. Deployments, scaling, backups, and changing limits; if humans are doing it repeatedly, you’ll build systems to automate it instead.

  • Partner with product engineering. Help teams design resilient services, review architectures for operational complexity, and build deployment pipelines that enable safe and fast shipping.

  • Optimize for cost and performance. CloudZero’s business is helping others optimize cloud costs. We should be exemplars of efficient cloud usage ourselves.

Requirements:

  • 3–5+ years of experience building and operating distributed systems in AWS

  • Strong skills in Python, Infrastructure as Code (e.g., Pulumi or Terraform), and Kubernetes

  • Hands-on experience with monitoring tools such as Prometheus or DataDog

  • Proven ability to debug production issues under pressure

  • Values thoughtful, reliable system design over reactive “hero” efforts

  • Balances automation intelligently — builds solutions to real problems, not automation for its own sake

  • Able to clearly explain complex technical issues to non-technical stakeholders

  • Strong documentation habits to support long-term team clarity and system stability

  • Excited to take ownership of infrastructure and solve operational challenges at scale

Equal Opportunity Employer
CloudZero is an equal opportunity employer and values diversity. We do not discriminate on the basis of race, religion, color, national origin, sex, gender, gender expression, sexual orientation, age, marital status, veteran status or disability status. All job offers are contingent upon the candidate passing background and reference checks.

CloudZero is unable to sponsor employment visas now or in the future. This role requires current U.S. work authorization without the need for sponsorship.