LLM Serving Engineer (Cloud AI Engineering), Senior / Staff Engineer
Qualcomm · San Diego, CA | Markham
About this role
Qualcomm is hiring a senior-level AI Engineer in the machine learning function based in San Diego, CA | Markham. The posting calls out experience with Python, CUDA, PyTorch, LLMs. Compensation is listed at $158,400–$237,600 per year.
- Role
- AI Engineer
- Function
- machine learning
- Level
- senior
- Track
- Tech leadership
- Location
- San Diego, CA | Markham
- Posted
- Apr 9, 2026
More roles at Qualcomm
Job description
from Qualcomm careers##
Company:
Qualcomm Technologies, Inc.
## Job Area:
Engineering Group, Engineering Group > Machine Learning Engineering
General Summary:
LLM Serving Engineer (Cloud AI Engineering)
Qualcomm is utilizing its traditional strengths in digital wireless technologies to play a central role in the evolution of Cloud AI. We are investing in several supporting technologies including Deep Learning. The Qualcomm Cloud AI team is developing hardware and software solutions for Inference Acceleration.
We are hiring LLM Serving Engineers at multiple levels to join our dynamic, collaborative team. This role spans the full product lifecycle—from cutting-edge research and development to commercial deployment—and demands strategic thinking, strong execution, and excellent communication skills.
This role involves the following activities:
* Building a scalable LLM inference platform using inference techniques (e.g. disaggregated serving and KV-Cache management, advanced parallelism, speculative algorithms, model optimization, specialized kernels).
* Contribute to the development of LLM Serving packages (e.g. vLLM, SGLang, TGI, Triton-Inference server, Dynamo, LLM-d).
* Work closely with customers to drive solutions by collaborating with internal compiler, firmware and platform teams.
* Work at the forefront of GenAI by understanding advanced algorithms (e.g. attention mechanisms, MoEs) and numerics to identify new optimization opportunities.
* Drive efficient serving through smart autoscaling, load balancing and routing.