Principal Software Engineer, DGX Cloud Production Engineering
Nvidia · Santa Clara, CA
About this role
Nvidia is hiring a principal-level Software Engineer based in Santa Clara, CA. The posting calls out experience with Python, Kubernetes, Linux, Distributed Systems.
- Role
- Software Engineer
- Function
- software engineering
- Level
- principal
- Track
- Tech leadership
- Employment
- Full-time
- Location
- Santa Clara, CA
- Posted
- May 18, 2026
More roles at Nvidia
Job description
from Nvidia careersNVIDIA DGX Cloud is scaling GPU infrastructure across internal, partner, and cloud environments. We are looking for Principal Software Engineers to help shape the technical direction for production engineering, Kubernetes-based operations, automation, and reliability across large-scale GPU clusters.
This role is for senior technical leaders who can define architecture, lead through influence, build critical systems, and turn ambiguous infrastructure problems into durable software and operating models.
What you’ll be doing:
Define and execute the technical strategy for DGX Cloud cluster operations, building the automation, GitOps, and Day 2 reliability needed to operate large-scale GPU clusters across NVIDIA Cloud Partners (NCPs) and on-prem environments.
Lead design and implementation of systems for cluster lifecycle, validation, repair, upgrades, observability, and readiness.
Establish patterns for Kubernetes-based GPU cluster operations across partner and on-prem environments.
Identify and eliminate operational toil through software, APIs, automation, and agent-assisted workflows.
Set technical standards for production readiness, SLOs, incident response, handoff gates, and operational acceptance.
Mentor engineers and influence platform, infrastructure, storage, networking, security, and workload teams.
What we need to see:
15+ years of experience building and operating large-scale distributed systems or cloud infrastructure.
Deep experience with Kubernetes, Linux, infrastructure automation, and production operations.
Strong programming experience in Go, Python, or similar.
This is an excerpt. Read the full job description on Nvidia careers →