As a Site Reliability Engineer on our Platform Operations team, you will wield your expertise to ensure that Numerateds innovative SaaS products are built on reliable, scalable, resilient and secure cloud infrastructure. You will help create a bridge between operations and product development by applying an operations mindset to our software engineering and vice versa. Furthermore, you will play a key role in maintaining and evolving our operations and information security praxes and in helping to ensure that our infrastructure meets the demands of our fast-paced, dynamic organization.
You bring experience with data center and cloud native architectures. Your knowledge around site operations and software development, along with understanding the role that infosec plays in infrastructure will make you successful.
In this role you will be all-in on tactics, managing, maintaining, monitoring, and supporting the day-to-day operations of our cloud computing presence and strategy, anticipating our future infrastructure needs and designing and implementing elegant solutions that meet them.
Essential Responsibilities /
- Design and develop tools to automate cloud and datacenter platform management.
- Partner with key stakeholders as a platform champion for cloud-native systems, and coach others on how to use platform capabilities effectively.
- Engage with development teams throughout the SDLC to help develop software for reliability.
- Collaborate across the organization to improve operations, efficiency and customer experience.
- Develop and maintain automation for routine management processes.
- Develop and maintain monitoring, diagnostics, and debug tooling to improve detection and response to application and infrastructure issues.
- Maintain appropriate controls and documentation to support compliance initiatives.
- Ensure compliance with security and compliance controls.
- Work with software architects and developers to design and implement cloud solutions.
- Drive innovation through the ongoing evaluation, design, and implementation of new technology.
- Provide continuous feedback to Product, Engineering and Cloud Operations team.
- Analyze and troubleshoot infrastructure issues, identify their root causes, and implement improvements to prevent their recurrence.
- Administer and manage SaaS applications and infrastructure AWS and legacy datacenters.
- Respond to incidents and contribute to retrospectives and post morte
- responsibilities go here
Education Requirements /
- Bachelors degree in Computer Science or other IT-related field is preferred
- Certification Requirements: AWS Certification a plus
Work Experience Requirements /
- 2+ years building and maintaining AWS infrastructure and/or AWS-hosted applications
- AWS networking and routing technologies (VPC, security groups, Route53, ELB)
- AWS security technology and practices (KMS, SSM, Secrets Manager, encryption, IAM)
- Familiarity with database technologies and maintenance (RDS, Aurora, PostgreSQL, Redis)
- Solid experience with linux system administration
- Experience with Infrastructure As Code (e.g., Terraform, Ansible, CloudFormation)
- Working knowledge with modern scripting languages, Python preferred
- Experience with monitoring and alerting capabilities using tools like Datadog
- Familiarity with Agile/scrum methodologies
- Exceptional communication and interpersonal skills with an ability to extract, translate, and communicate meaningful information with management and peers
- Strong technical documentation skills for both workflows and support documentation