About this role

OpenAI is hiring a mid-level ML Platform Engineer in the machine learning function based in San Francisco, CA (hybrid). The posting calls out experience with Python, Machine Learning, Distributed Systems, Observability. Compensation is listed at $205,000–$445,000 per year.

Role: ML Platform Engineer
Function: machine learning
Level: mid
Track: Individual contributor
Employment: Full-time
Location: San Francisco, CA
Work mode: Hybrid
Department: Scaling
Posted: Oct 29, 2025

AI Summary

Optimize training throughput for OpenAI's internal ML framework while enabling researcher experimentation. Profile and optimize distributed training systems across massive GPU clusters. Requires strong software engineering in Python, deep systems knowledge, and passion for performance optimization.

Upgrade to Pro for AI summaries, resume match scores & career intelligence →

More roles at OpenAI

Research Engineer, Applied AI Engineering

San Francisco, CA · mid

PyTorch LLMs Deep Learning

Software Engineer, Financial Engineering

San Francisco, CA · mid

API Development Performance Optimization OpenAI

Security Engineer, Application Security

San Francisco, CA · mid

Python Java Security

Software Engineer, Real Time

Seattle, WA · mid

Machine Learning Python Kubernetes

Software Engineer, Integrity Foundations

San Francisco, CA · mid

Python Azure Kubernetes All OpenAI jobs →

Job description

from OpenAI careers

About the Team

Training Runtime designs the core distributed machine-learning training runtime that powers everything from early research experiments to frontier-scale model runs. With a dual mandate to accelerate researchers and enable frontier scale, we’re building a unified, modular runtime that meets researchers where they are and moves with them up the scaling curve.

Our work focuses on three pillars: high-performance, asynchronous, zero-copy tensor and optimizer-state-aware data movement; performant, high-uptime, fault-tolerant training frameworks (training loop, state management, resilient checkpointing, deterministic orchestration, and observability); and distributed process management for long-lived, job-specific and user-provided processes.

We integrate proven large-scale capabilities into a composable, developer-facing runtime so teams can iterate quickly and run reliably at any scale, partnering closely with model-stack, research, and platform teams. Success for us is measured by raising both training throughput (how fast models train) and researcher throughput (how fast ideas become experiments and products).

About the Role

As a Training: ML Framework Engineer, you will work on improving the training throughput for our internal training framework, while enabling researchers to experiment with new ideas. This requires good engineering (for example designing, implementing, and optimizing state-of-the-art AI models), writing bug-free machine learning code (surprisingly difficult!), and acquiring deep knowledge of the performance of supercomputers. In all the projects this role pursues, the ultimate goal is to push the field forward.

This is an excerpt. Read the full job description on OpenAI careers →

All machine learning jobs machine learning in San Francisco, CA Jobs in San Francisco, CA machine learning salaries machine learning career path

All OpenAI Jobs Browse machine learning roles mid positions