Senior Software Engineer - NVLink Rack Scale Stability and Reliability
Nvidia · Santa Clara, CA
About this role
Nvidia is hiring a senior-level Software Engineer based in Santa Clara, CA. The posting calls out experience with Networking, Embedded Systems, Python, C.
- Role
- Software Engineer
- Function
- software engineering
- Level
- senior
- Track
- Individual contributor
- Employment
- Full-time
- Location
- Santa Clara, CA
- Posted
- May 28, 2026
More roles at Nvidia
Job description
from Nvidia careersNVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars. NVIDIA is looking for phenomenal people like you to help us accelerate the next wave of artificial intelligence.
We are looking for highly motivated Senior Software Engineers to join our Fabric Networking team with a targeted focus on NVLink Rack-Scale Systems Stability & Reliability. In this role, you will partner closely with architects and developers building our next-generation NVLink and NVSwitch systems, helping transform first-of-their-kind platforms into stable, reliable, and volume production-ready systems. You will work on complex system-level challenges spanning resiliency, diagnostics, recovery, and large-scale AI infrastructure, contributing directly to the software foundation powering next-generation datacenter deployments.
What you will be doing:
Drive platform bringup, feature enablement, end-to-end software validation, and debug for next-generation NVLink-based GPU and rack-scale systems.
Develop tools, diagnostics, automation, and infrastructure for system validation, regression testing, and fleet support.
Lead reliability and MTBI validation through stress testing, telemetry analysis, failure injection, and issue resolution.
This is an excerpt. Read the full job description on Nvidia careers →