Principal Staff Software Developer – AI/ML Performance Validation & Systems Testing
AMD · Markham, Canada · Engineering
About this role
AMD is hiring a principal-level QA Engineer in the software engineering function based in Markham, Canada. The posting calls out experience with PyTorch, Testing, vLLM, Python.
- Role
- QA Engineer
- Function
- software engineering
- Level
- principal
- Track
- Individual contributor
- Location
- Markham, Canada
- Department
- Engineering
- Posted
- May 15, 2026
More roles at AMD
Job description
from AMD careersWHAT YOU DO AT AMD CHANGES EVERYTHING
At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.
About the Role
We are seeking a Principal Software Quality Engineer to serve as the senior technical leader for ROCm software validation across compute workloads and server-class systems. In this individual-contributor leadership role, you will define how AMD proves ROCm is ready to ship — from unit and component testing, through full-stack workload validation, to multi-node system-level qualification on AMD Instinct™ GPU platforms. You will set the technical direction for validation strategy, build and evolve the test infrastructure that gates every ROCm release, and personally drive the hardest debugging, characterization, and qualification problems. Your work directly determines the quality bar experienced by hyperscalers, OEMs, sovereign-AI customers, and the open-source community running ROCm in production.