Senior Production Engineer - DGX Cloud
Nvidia · Remote (United States)
About this role
Nvidia is hiring a senior-level Site Reliability Engineer in the software engineering function as a remote position. The posting calls out experience with Python, Data Structures, DevOps, Kubernetes.
- Role
- Site Reliability Engineer
- Function
- software engineering
- Level
- senior
- Track
- Individual contributor
- Employment
- Full-time
- Location
- Remote (United States)
- Work mode
- Remote
- Posted
- May 18, 2026
More roles at Nvidia
Job description
from Nvidia careersNVIDIA is hiring experienced Senior Production Engineers to help scale up its AI Infrastructure. We expect you to have significant experience with site reliability principles and techniques including reliability assessments, incident management processes, production system observability, monitoring and alerting, automated deployments and toil elimination. We view Production Engineering as a software engineering discipline and expect significant contributions to our codebase. We welcome out-of-the-box thinkers who can provide new ideas with strong execution bias. Expect to be constantly challenged, improving, and evolving for the better. You will help advance NVIDIA's capacity to build and deploy leading infrastructure solutions for a broad range of AI-based applications. If you're creative, passionate about Production Engineering, and love having fun, please apply today!
For two decades, we have pioneered visual computing, the art and science of computer graphics. With the invention of the GPU - the engine of modern visual computing - the field has expanded to encompass video games, movie production, product design, medical diagnosis and scientific research. Today, we stand at the beginning of the next era, the AI computing era, ignited by a new computing model, GPU deep learning.
What you will be doing: