About this role

Nvidia is hiring a senior-level Systems Engineer in the operations function based in Santa Clara, CA. The posting calls out experience with Docker, Linux, Observability and roughly 5+ years of relevant work. Listed education preference: a bachelor's degree or equivalent.

Role: Systems Engineer
Function: operations
Level: senior
Track: Individual contributor
Employment: Full-time
Location: Santa Clara, CA
Experience: 5+ years
Education: Bachelor's degree
Posted: Apr 20, 2026

AI Summary

Manage and optimize large-scale job scheduling systems (LSF, Slurm) across multi-site EDA compute infrastructure. Drive performance improvements, automation, and reliability through problem-solving across scheduler, OS, and workload layers. Requires 5+ years Linux infrastructure operations experience and hands-on scheduler tuning expertise.

Upgrade to Pro for AI summaries, resume match scores & career intelligence →

More roles at Nvidia

Manager, System Test Engineering

Taipei, Taiwan · manager

Python Bash Testing

Senior Board Test Engineer

Santa Clara, CA · senior

Python Bash Testing

System Level Test Engineer

Santa Clara, CA · mid

Python Linux Testing

ATE Test Development Engineer

Santa Clara, CA · mid

Python C Testing

Senior Debug System Engineer, Datacenter

Santa Clara, CA · senior

Embedded Systems All Nvidia jobs →

Job description

from Nvidia careers

As a member of the Hardware Infrastructure EDA Compute team, you will optimize, scale, and support workload scheduling systems that directly impact design velocity and infrastructure efficiency. Success in this role requires both operational precision along with developing and supporting forward-looking resource management solutions that address evolving compute demands. Beyond day-to-day operations, the role drives improvements in observability, service reliability, and automation, ensuring the EDA compute environment remains resilient, measurable, and aligned with long-term engineering demands.

What you'll be doing:

Manage, scale, and optimize job scheduling systems (LSF, Slurm, etc.) in a large-scale, multi-site environment supporting EDA and other compute-intensive workloads
Analyze scheduler and infrastructure performance data to identify systemic bottlenecks and drive measurable improvements in utilization, throughput, and turnaround time
Lead problem solving across scheduler, OS, and workload layers, ensuring timely resolution of service-impacting issues
Identify recurring operational challenges and implement targeted automation or process improvements to reduce manual effort and prevent repeat incidents
Help define and track reliable metrics and SLOs for service performance and reliability, partnering with customers to ensure expectations are realistic and measurable
Contribute to operational standards, documentation, and best practices to improve consistency across sites
Partner directly with customer teams to clarify requirements, translate technical tradeoffs, and drive issues to closure
This is an excerpt. Read the full job description on Nvidia careers →

All operations jobs operations in Santa Clara, CA Jobs in Santa Clara, CA operations salaries operations career path

All Nvidia Jobs Browse operations roles senior positions

Senior HPC and LSF Operations Engineer

About this role

More roles at Nvidia

Job description