senior software engineering Infrastructure Engineer ic 5+ yrs
$160,000 – $230,000
USD per year

About this role

Together AI is hiring a senior-level Infrastructure Engineer in the software engineering function based in San Francisco, CA. The posting calls out experience with Go, CUDA, AWS, GCP and roughly 5+ years of relevant work. Compensation is listed at $160,000–$230,000 per year.

Role
Infrastructure Engineer
Function
software engineering
Level
senior
Track
Individual contributor
Employment
Full-time
Location
San Francisco, CA
Experience
5+ years
Department
Engineering
AI Summary
Design and build high-performance backend services for AI cloud infrastructure, managing hardware virtualization, storage provisioning, and Kubernetes/Slurm clusters. Requires 5+ years backend development in Golang, deep Kubernetes and VM/hypervisor expertise, and experience with distributed microservice architectures.

More roles at Together AI

Senior Backend Engineer, Inference Platform
San Francisco, CA · senior
Python TypeScript Rust
Senior Data Engineer
San Francisco, CA · senior
Python TypeScript Java
Senior Developer Productivity Engineer
San Francisco, CA · senior
Python JavaScript TypeScript
Senior Machine Learning Engineer, Voice AI
San Francisco, CA · senior
Python CUDA Serverless
Senior Network Engineer
San Francisco, CA · senior
Python AWS GCP
All Together AI jobs →

Job description

from Together AI careers

About the Role

Together AI is building the AI Acceleration Cloud, an end-to-end platform for the full generative AI lifecycle, combining the fastest LLM inference engine with state-of-the-art AI cloud infrastructure.

As a Senior AI Infrastructure Engineer, you will play a key role in building the next generation AI cloud platform – a highly available, global, blazing-fast cloud infrastructure that virtualizes cutting-edge ML hardware (GB200s/GB300s, BlueField DPUs) and enables state-of-the-art ML practitioners with self-serve AI cloud services, such as on-demand + managed Kubernetes and Slurm clusters. This platform serves both our internal SaaS products (inference, fine-tuning) and our external cloud customers, spanning dozens of data centers across the world.

Responsibilities

  • Design, build, and maintain performant, secure, and highly-available backend services/operators that run in our data centers and automate hardware management, such as Infiniband partitioning, in-DC parallel storage provisioning, and VM provisioning.
  • Design and build out the IaaS software layer for a new GB200 data center with thousands of GPUs.
  • Work on a global multi-exabyte high-performance object store, serving massive datasets for pretraining.
  • Build advanced observability stacks for our customers with automated node lifecycle management for fault-tolerant distributed pretraining.
  • Perform architecture and research work for decentralized AI workloads
  • Work on the core, open-source Together AI platform
  • This is an excerpt. Read the full job description on Together AI careers →
All software engineering jobs software engineering in San Francisco, CA Jobs in San Francisco, CA software engineering salaries software engineering career path
All Together AI Jobs Browse software engineering roles senior positions