Senior Performance Architect, Nemotron
Nvidia · Santa Clara, CA
About this role
Nvidia is hiring a principal-level Principal Engineer in the software engineering function based in Santa Clara, CA. The posting calls out experience with Python, CUDA, PyTorch, LLMs.
- Role
- Principal Engineer
- Function
- software engineering
- Level
- principal
- Track
- Tech leadership
- Employment
- Full-time
- Location
- Santa Clara, CA
- Posted
- May 19, 2026
More roles at Nvidia
Job description
from Nvidia careersWe are now looking for a Senior Performance Architect for Nemotron! At NVIDIA, we are redefining the future of AI systems through deep model–system–hardware co-design. We are looking for a forward-thinking Nemotron Performance Architect to shape the next generation of Nemotron models through performance modeling, analysis, and forward projections. In this role, you will predict before we build - developing high-fidelity models to evaluate how architectural choices translate into real-world deployment efficiency. You will ensure that future models achieve Pareto-optimal trade-offs across accuracy, throughput, and interactivity on target platforms.
Recent efforts such as LatentMoE architectures and the Nemotron Super model exemplify the kind of performance-driven co-design you will help advance—where modeling insights directly shape model architecture and system efficiency at scale. This role sits at the center of Generative AI evolution, partnering across research, framework development, compiler, and hardware teams to guide decisions that determine how efficiently intelligence scales in production.
What You’ll Be Doing:
Develop high-fidelity analytical performance models to prototype emerging algorithmic techniques & hardware optimizations to drive model-hardware co-design Nemotron family of models.
Prioritize features to guide future software and hardware roadmap based on detailed performance modeling and analysis
Model end-to-end performance impact of emerging GenAI workflows - such as Speculative Decoding, Agentic Pipelines, Inference-time compute scaling, RL etc. – to understand future datacenter needs
This is an excerpt. Read the full job description on Nvidia careers →