senior operations Systems Engineer ic 5+ yrs Bachelor's · Posted Apr 20, 2026

About this role

Nvidia is hiring a senior-level Systems Engineer in the operations function based in Santa Clara, CA. The posting calls out experience with Docker, Linux, Observability and roughly 5+ years of relevant work. Listed education preference: a bachelor's degree or equivalent.

Role
Systems Engineer
Function
operations
Level
senior
Track
Individual contributor
Employment
Full-time
Location
Santa Clara, CA
Experience
5+ years
Education
Bachelor's degree
Posted
Apr 20, 2026
AI Summary
Manage and optimize large-scale job scheduling systems (LSF, Slurm) across multi-site EDA compute infrastructure. Drive performance improvements, automation, and reliability through problem-solving across scheduler, OS, and workload layers. Requires 5+ years Linux infrastructure operations experience and hands-on scheduler tuning expertise.

More roles at Nvidia

Manager, System Test Engineering
Taipei, Taiwan · manager
Python Bash Testing
Senior Board Test Engineer
Santa Clara, CA · senior
Python Bash Testing
System Level Test Engineer
Santa Clara, CA · mid
Python Linux Testing
ATE Test Development Engineer
Santa Clara, CA · mid
Python C Testing
Senior Debug System Engineer, Datacenter
Santa Clara, CA · senior
Embedded Systems
All Nvidia jobs →

Job description

from Nvidia careers

As a member of the Hardware Infrastructure EDA Compute team, you will optimize, scale, and support workload scheduling systems that directly impact design velocity and infrastructure efficiency. Success in this role requires both operational precision along with developing and supporting forward-looking resource management solutions that address evolving compute demands. Beyond day-to-day operations, the role drives improvements in observability, service reliability, and automation, ensuring the EDA compute environment remains resilient, measurable, and aligned with long-term engineering demands.


What you'll be doing:

  • Manage, scale, and optimize job scheduling systems (LSF, Slurm, etc.) in a large-scale, multi-site environment supporting EDA and other compute-intensive workloads

  • Analyze scheduler and infrastructure performance data to identify systemic bottlenecks and drive measurable improvements in utilization, throughput, and turnaround time

  • Lead problem solving across scheduler, OS, and workload layers, ensuring timely resolution of service-impacting issues

  • Identify recurring operational challenges and implement targeted automation or process improvements to reduce manual effort and prevent repeat incidents

  • Help define and track reliable metrics and SLOs for service performance and reliability, partnering with customers to ensure expectations are realistic and measurable

  • Contribute to operational standards, documentation, and best practices to improve consistency across sites

  • Partner directly with customer teams to clarify requirements, translate technical tradeoffs, and drive issues to closure

    This is an excerpt. Read the full job description on Nvidia careers →
All operations jobs operations in Santa Clara, CA Jobs in Santa Clara, CA operations salaries operations career path
All Nvidia Jobs Browse operations roles senior positions