staff software engineering Staff Engineer tech_leadership 5+ yrs · Posted Dec 18, 2024

$150,000 – $350,000

USD per year

Skills

CUDA Docker PyTorch Linux Machine Learning vLLM

AI Summary

Staff Engineer optimizing ML systems for scale at Modal's AI infrastructure platform. Requires 5+ years high-performance code experience, expertise with PyTorch, inference engines (vLLM/TensorRT), CUDA, and GPU performance engineering. Focus on improving throughput and latency for language and diffusion models.

Upgrade to Pro to unlock AI summaries →

About Us:

Modal provides the infrastructure foundation for AI teams. With instant GPU access, sub-second container startups, and native storage, Modal makes it simple to train models, run batch jobs, and serve low-latency inference. We have thousands of customers who rely on us for production AI workloads, including Lovable, Scale AI, Substack, and Suno.

We're a fast-growing team based out of NYC, SF, and Stockholm. We've hit 9-figure ARR and recently raised a Series B at a $1.1B valuation. Our investors include Lux Capital, Redpoint Ventures, Amplify Partners, and Elad Gil.

Working at Modal means joining one of the fastest-growing AI infrastructure organizations at an early stage, with many opportunities to grow within the company. Our team includes creators of popular open-source projects (e.g. Seaborn, Luigi), academic researchers, international olympiad medalists, and experienced engineering and product leaders with decades of experience.

The Role:

We are looking for strong engineers with experience in making ML systems performant at scale. If you are interested in contributing to open-source projects and Modal’s container runtime to push language and diffusion models towards higher throughput and lower latency, we’d love to hear from you!

Requirements:

5+ years of experience writing high-quality, high-performance code.
Experience working with torch, high-level ML frameworks, and inference engines (vLLM or TensorRT).
Familiarity with Nvidia GPU architecture and CUDA.
Experience with ML performance engineering (tell us a story about boosting GPU performance — debugging SM occupancy issues, rewriting an algorithm to be compute-bound, eliminating host overhead, etc).
Nice-to-have: familiarity with low-level operating system foundations (Linux kernel, file systems, containers, etc).
Ability to work in-person, in our NYC, San Francisco or Stockholm office.

All Modal Jobs Browse software engineering roles staff positions

Member of Technical Staff - ML Performance

About Us:

The Role:

Requirements: