Software Development Engineer — CI/CD, Trainium Manufacturing Test Infrastructure
Amazon · Cupertino, CA · Software Development
About this role
Amazon is hiring a mid-level Software Engineer based in Cupertino, CA. The posting calls out experience with Python, TypeScript, Java, Rust. Compensation is listed at $127,100–$185,000 per year.
- Role
- Software Engineer
- Function
- software engineering
- Level
- mid
- Track
- Individual contributor
- Employment
- Full-time
- Location
- Cupertino, CA
- Department
- Software Development
- Posted
- May 18, 2026
More roles at Amazon
Job description
from Amazon careersThe Manufacturing Infrastructure Release Team within Annapurna ML builds and operates the software platform that orchestrates hardware testing and validation across multiple Trainium manufacturing sites worldwide. Our platform deploys containerized microservices to AWS Outposts at manufacturing partner factories — enabling component-level testing, card/board validation, server-level testing, and rack-level testing at scale. We directly enable the manufacturing ramp of AWS's custom AI training chips. We are looking for a Software Development Engineer to own and evolve the CI/CD infrastructure that delivers software to Trainium manufacturing sites worldwide. You will build and maintain deployment pipelines that push tested, validated code to production Outpost environments across multiple manufacturing partners. Your work directly impacts how fast Trainium servers move from factory floor to customer — every hour of pipeline latency is lost customer revenue. Key job responsibilities - Design, build, and maintain CI/CD pipelines (AWS CDK, Pipelines) that deploy containerized services to AWS Outposts at global manufacturing sites - Extend the manufacturing infrastructure platform (TypeScript CDK, Python microservices) to support new workflows for Trainium accelerator cards, baseboards, and rack-level integration - Build integration test frameworks and canary systems that validate service health across all production sites before and after deployments - Develop automated alarming,…