Triton Compiler/GPU Kernel Performance Engineer
AMD · Shanghai, China · Engineering
About this role
AMD is hiring a senior-level Embedded Software Engineer in the software engineering function based in Shanghai, China. The posting calls out experience with CUDA, Linux, Deep Learning, Distributed Systems.
- Role
- Embedded Software Engineer
- Function
- software engineering
- Level
- senior
- Track
- Individual contributor
- Location
- Shanghai, China
- Department
- Engineering
- Posted
- Mar 18, 2026
More roles at AMD
Job description
from AMD careersWHAT YOU DO AT AMD CHANGES EVERYTHING
At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.
Job Description
Kernel Performance Architect
WHAT YOU DO AT AMD CHANGES EVERYTHING
At AMD, we build the compute engines that power AI, high-performance computing, and next-generation data centers. Our GPU platforms drive breakthroughs across machine learning, scientific computing, and large-scale distributed systems.
We are looking for a Kernel Performance Architect who can bridge hardware, compiler, runtime, and application layers to define and drive end-to-end performance strategy for AI workloads on AMD GPUs.
This role is not just about writing fast kernels — it is about understanding why they are fast, predicting performance behavior across architectures, and shaping the abstractions that make performance portable.