About this role

Nvidia is hiring a senior-level Site Reliability Engineer in the software engineering function based in Santa Clara, CA. The posting calls out experience with Python, Kubernetes, Docker, OpenStack.

Role: Site Reliability Engineer
Function: software engineering
Level: senior
Track: Individual contributor
Employment: Full-time
Location: Santa Clara, CA
Posted: May 18, 2026

More roles at Nvidia

Senior ASIC Verification Engineer, Coherent High Speed Interconnect

Santa Clara, CA · senior

Deep Learning

Senior ASIC Verification Engineer, Coherent High Speed Interconnect

Toronto, Canada · senior

Deep Learning

Senior Technical Marketing Engineer - EDA and Semiconductor

Santa Clara, CA · senior

CUDA Agile Machine Learning

Senior Software Architect, AI Systems and Networking

Santa Clara, CA · senior

Reinforcement Learning Rust C

Senior Software Engineer, Infrastructure Engineering - Omniverse

Remote (United States) · senior

Kubernetes Cloud Computing Python All Nvidia jobs →

Job description

from Nvidia careers

Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. This is a highly specialized discipline which demands knowledge across different systems, networking, coding, database, capacity management, continuous delivery and deployment and open source cloud enabling technologies like Kubernetes and OpenStack. SRE at NVIDIA ensures that our internal and external facing GPU cloud services run maximum reliability and uptime as promised to the users and at the same time enabling developers to make changes to the existing system through careful preparation and planning while keeping an eye on capacity, latency and performance. SRE is also a mindset and a set of engineering approaches to running better production systems and optimizations. Much of our software development focuses on eliminating manual work through automation, performance tuning and growing efficiency of production systems.

This is an excerpt. Read the full job description on Nvidia careers →

All software engineering jobs software engineering in Santa Clara, CA Jobs in Santa Clara, CA software engineering salaries software engineering career path

All Nvidia Jobs Browse software engineering roles senior positions

Senior Site Reliability Engineer - Observability and Telemetry Platform

About this role

More roles at Nvidia

Job description