Senior Software Engineer, Cloud-Native Stack – CSP Engagements
Nvidia · Santa Clara, CA
About this role
Nvidia is hiring a senior-level Software Engineer based in Santa Clara, CA. The posting calls out experience with Terraform, Ansible, Helm, Python.
- Role
- Software Engineer
- Function
- software engineering
- Level
- senior
- Track
- Individual contributor
- Employment
- Full-time
- Location
- Santa Clara, CA
- Posted
- May 15, 2026
More roles at Nvidia
Job description
from Nvidia careersWe are developing advanced multi-rack, multi-tenant AI/ML datacenters with NVIDIA GB200, and upcoming GB300 GPUs. NVIDIA seeks a Senior Software Engineer for our CSP (Cloud Service Provider) Engagements team to focus on the cloud-native stack for datacenter products like GB200. In this role, You will define customer workflows, prototype stack enhancements, and debug the toughest Kubernetes + Slurm issues in multi-rack, multi-tenant AI datacenters. You'll tackle complex scheduling challenges across racks, tenants, and clouds as part of the CSP engagements team.
What you’ll be doing:
Perform deep-dive debugging of multi-rack, multi-tenant clusters: scheduler behavior, container runtime issues, device-plugin crashes, RDMA/IB fabric anomalies, etc.
Gather customer requirements and prototype feature extensions for Kubernetes operators, Slurm plugins, and custom micro-services that expose new GPU capabilities.
Drive joint architecture reviews and “whiteboard” sessions with CSP and internal platform teams; convert findings into RFCs and upstream pull requests.
Create reproducible testbeds (Helm/Ansible/Terraform) that mirror customer environments; automate validation and benchmark suites.
Deliver technical collateral-design docs, how-to guides, demo scripts-and present at customer on-sites, KubeCon, and SlurmUG.
Collaborate with AE, FAE, and Solution Architect teams to deliver integrated customer solutions and technical documentation.
What we need to see:
Strong source-level expertise in Kubernetes internals (scheduler, CRI/CNI/CSI, operators) and Slurm (federation, power-save, plugins).
This is an excerpt. Read the full job description on Nvidia careers →