mid machine learning AI Engineer ic 3+ yrs
$160,000 – $230,000
USD per year

About this role

Together AI is hiring a mid-level AI Engineer in the machine learning function based in San Francisco, Singapore. The posting calls out experience with Python, CUDA, Kubernetes, PyTorch and roughly 3+ years of relevant work. Compensation is listed at $160,000–$230,000 per year.

Role
AI Engineer
Function
machine learning
Level
mid
Track
Individual contributor
Employment
Full-time
Location
San Francisco, Singapore
Experience
3+ years
Department
Research
AI Summary
Design and optimize distributed inference engines for LLMs and multimodal models, focusing on low-latency, high-throughput serving. Requires 3+ years in deep learning inference frameworks or distributed systems, expertise in LLM inference frameworks (TensorRT-LLM, vLLM, etc.), and GPU programming or compiler knowledge.

More roles at Together AI

Analytics Engineer — Data Warehouse
San Francisco, CA · mid
Python SQL Spark
Backend Software Engineer — Data Platform & AI Data Products
San Francisco, CA · mid
Python Java Rust
Customer Support Engineer (GPU Cluster)
San Francisco, CA · mid
Swift Kubernetes Ansible
Customer Support Engineer (GPU Cluster), India
India · mid
Swift Kubernetes Ansible
Customer Support Engineer (Inference), India
India · mid
Python JavaScript TypeScript
All Together AI jobs →

Job description

from Together AI careers

About the Role

At Together.ai, we are building state-of-the-art infrastructure to enable efficient and scalable inference for large language models (LLMs). Our mission is to optimize inference frameworks, algorithms, and infrastructure, pushing the boundaries of performance, scalability, and cost-efficiency.

We are seeking an Inference Frameworks and Optimization Engineer to design, develop, and optimize distributed inference engines that support multimodal and language models at scale. This role will focus on low-latency, high-throughput inference, GPU/accelerator optimizations, and software-hardware co-design, ensuring efficient large-scale deployment of LLMs and vision models.

This role offers a unique opportunity to shape the future of LLM inference infrastructure, ensuring scalable, high-performance AI deployment across a diverse range of applications. If you're passionate about pushing the boundaries of AI inference, we’d love to hear from you!

Responsibilities

Inference Framework Development and Optimization

  • Design and develop fault-tolerant, high-concurrency distributed inference engine for text, image, and multimodal generation models.
  • Implement and optimize distributed inference strategies, including Mixture of Experts (MoE) parallelism, tensor parallelism, pipeline parallelism for high-performance serving.
  • Apply CUDA graph optimizations, TensorRT/TRT-LLM graph optimizations, and PyTorch-based compilation (torch.compile), and speculative decoding to enhance efficiency and scalability.

Software-Hardware Co-Design and AI Infrastructure

  • Collaborate with hardware teams on performance bottleneck analysis, co-optimize inference performance for GPUs, TPUs, or custom accelerators.
  • This is an excerpt. Read the full job description on Together AI careers →
All machine learning jobs machine learning in San Francisco, Singapore Jobs in San Francisco, Singapore machine learning salaries machine learning career path
All Together AI Jobs Browse machine learning roles mid positions