Symbotic
New Grad- Robotics Reliability Engineer
Job Description
What we need
Symbotic is seeking a Bot Reliability (Systems) Engineer to drive the reliability, performance, and scalability of our autonomous warehouse platform powered by Symbotic’s mobile robots. This is a high-impact, hands-on engineering role focused on solving complex system-level challenges across large-scale robotic fleets deployed at customer sites.
This role sits at the intersection of robotics software, hardware integration, and operational performance. The primary objective is to diagnose, resolve, and prevent system-level issues, ensuring our robotic systems operate reliably and consistently meet customer performance KPIs.
We are looking for a technically strong, data-driven engineer who thrives in complex, real-world environments and can translate ambiguous system behaviors into structured analysis and actionable engineering improvements.
What you'll do
-
Fleet-Scale System Reliability
-
Identify, triage, and root-cause system-level issues impacting large-scale robotic fleets.
-
Drive improvements in system reliability, availability, and performance across thousands of deployed robots.
-
Define and monitor system performance guardrails tied to customer KPIs (throughput, error rates, recovery time, uptime).
-
Partner with field teams to debug and resolve production issues in live environments.
-
End-to-End Systems Debugging & Integration
-
Work across robotics software, hardware, controls, perception, and infrastructure to diagnose complex system interactions.
-
Debug issues spanning embedded systems, distributed services, real-time control loops, and operational workflows.
-
Collaborate with cross-functional teams to drive fixes and long-term solutions.
-
Contribute to system design improvements that enhance robustness, fault tolerance, and scalability.
-
Data-Driven Performance Optimization
-
Analyze robot logs, telemetry, and diagnostics data to identify failure modes and performance bottlenecks.
-
Build and use tools (SQL, Python, dashboards) to investigate trends and validate hypotheses.
-
Develop mechanisms for regression detection, failure trend analysis, and performance monitoring.
-
Drive continuous improvement through structured experiments and data-backed decisions.
-
Operational Excellence & Continuous Improvement
-
Own reliability metrics and contribute to improving system observability and debuggability.
-
Document failure modes, learnings, and standard operating procedures for issue resolution.
-
Support release validation and help ensure changes meet reliability and performance expectations.
-
Act as a technical escalation point for complex system issues.
What you'll need
-
Bachelors or Masters degree in Computer Engineering, Robotics, Mechanical Engineering, or related field
-
Experience in robotics, automation, or complex distributed systems engineering.
-
Strong systems engineering mindset with experience in robotics control software, real-time systems, and hardware-software integration.
-
Demonstrated experience in structured root-cause analysis and failure investigation.
-
Proficiency in data analysis and scripting (Python, SQL, or similar).
-
Experience working with logs, telemetry systems, and large-scale operational data.
-
Familiarity with Linux environments and version control systems (Git).
-
Experience working in production environments with deployed systems (not just lab prototypes).
-
Strong problem-solving skills and ability to work across ambiguous, cross-functional system boundaries.
-
Experience in Agile development environments
Our environment
-
Up to 10% travel may be required. Employees must have a valid driver’s license and the ability to drive and/or fly to client and other customer locations.
-
The employee is responsible for owning a credit card and managing expenses personally to be reimbursed on a bi-weekly basis.