Senior HPC and LSF Operations Engineer
Nvidia · Santa Clara, CA
About this role
Nvidia is hiring a senior-level Systems Engineer in the operations function based in Santa Clara, CA. The posting calls out experience with Docker, Linux, Observability and roughly 5+ years of relevant work. Listed education preference: a bachelor's degree or equivalent.
- Role
- Systems Engineer
- Function
- operations
- Level
- senior
- Track
- Individual contributor
- Employment
- Full-time
- Location
- Santa Clara, CA
- Experience
- 5+ years
- Education
- Bachelor's degree
- Posted
- Apr 20, 2026
More roles at Nvidia
Job description
from Nvidia careersAs a member of the Hardware Infrastructure EDA Compute team, you will optimize, scale, and support workload scheduling systems that directly impact design velocity and infrastructure efficiency. Success in this role requires both operational precision along with developing and supporting forward-looking resource management solutions that address evolving compute demands. Beyond day-to-day operations, the role drives improvements in observability, service reliability, and automation, ensuring the EDA compute environment remains resilient, measurable, and aligned with long-term engineering demands.
What you'll be doing:
Manage, scale, and optimize job scheduling systems (LSF, Slurm, etc.) in a large-scale, multi-site environment supporting EDA and other compute-intensive workloads
Analyze scheduler and infrastructure performance data to identify systemic bottlenecks and drive measurable improvements in utilization, throughput, and turnaround time
Lead problem solving across scheduler, OS, and workload layers, ensuring timely resolution of service-impacting issues
Identify recurring operational challenges and implement targeted automation or process improvements to reduce manual effort and prevent repeat incidents
Help define and track reliable metrics and SLOs for service performance and reliability, partnering with customers to ensure expectations are realistic and measurable
Contribute to operational standards, documentation, and best practices to improve consistency across sites
Partner directly with customer teams to clarify requirements, translate technical tradeoffs, and drive issues to closure
This is an excerpt. Read the full job description on Nvidia careers →