Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training
Amazon · Cupertino, CA · Software Development
About this role
Amazon is hiring a senior-level Software Engineer based in Cupertino, CA. The posting calls out experience with Python, AWS, TensorFlow, PyTorch. Compensation is listed at $193,300–$261,500 per year.
- Role
- Software Engineer
- Function
- software engineering
- Level
- senior
- Track
- Individual contributor
- Employment
- Full-time
- Location
- Cupertino, CA
- Department
- Software Development
- Posted
- Jun 20, 2025
More roles at Amazon
Job description
from Amazon careersAnnapurna Labs designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world. AWS Neuron is the complete software stack for the AWS Trainium (Trn1/Trn2) and Inferentia (Inf1/Inf2) our cloud-scale Machine Learning accelerators. This role is for a Senior Machine Learning Engineer in the Distribute Training team for AWS Neuron, responsible for development, enablement and performance tuning of a wide variety of ML model families, including massive-scale Large Language Models (LLM) such as GPT and Llama, as well as Stable Diffusion, Vision Transformers (ViT) and many more. The ML Distributed Training team works side by side with chip architects, compiler engineers and runtime engineers to create, build and tune distributed training solutions with Trainium instances. Experience with training these large models using Python is a must. FSDP (Fully-Sharded Data Parallel), Deepspeed, Nemo and other distributed training libraries are central to this and extending all of this for the Neuron based system is key. Key job responsibilities…