Principal Product Manager
Nvidia · Santa Clara, CA
About this role
Nvidia is hiring a principal-level Product Manager based in Santa Clara, CA. The posting calls out experience with DevOps, MLOps, Distributed Systems, AI Agents.
- Role
- Product Manager
- Function
- product
- Level
- principal
- Track
- Individual contributor
- Employment
- Full-time
- Location
- Santa Clara, CA
- Posted
- May 14, 2026
More roles at Nvidia
Job description
from Nvidia careersNVIDIA is driving a vision for AI factories that convert tokens to intelligence at scale to power AI demands of tomorrow. Maintaining AI infrastructure at scale takes more than human involvement; it demands smart automation. The orchestration engine for AI factory break-fix runs live in production at DGX Cloud. As the Product Manager leading all aspects of resilient automation at AI Factory, you will manage break-fix automation. You will develop the product strategy, improve operator experience, and guide the roadmap for professionals. You will build a scalable, reliable product from a strong engineering foundation that NVIDIA Cloud Partners depend on to uphold their SLAs. This is your chance to compose how AI factories self-heal!
What You’ll Be Doing:
Take full responsibility for the strategic direction and roadmap of the break-fix automation system spanning multiple vendors, technologies, and CSPs.
Define automation confidence thresholds, blocking issue criteria, and human-in-the-loop intervention points that balance speed with operational safety.
Build the operator UX for repair queues, workflow transparency, and audit trails — ensuring on-call engineers have the context they need to act quickly and confidently.
Drive the integration between failure attribution and automated repair actions, following through from detection to resolution.
Define repair SLOs and own the metrics framework for time-to-drain, time-to-healthy, and overall fleet availability.
This is an excerpt. Read the full job description on Nvidia careers →