mid machine learning ML Platform Engineer ic Hybrid · Posted Oct 29, 2025
$205,000 – $445,000
USD per year

About this role

OpenAI is hiring a mid-level ML Platform Engineer in the machine learning function based in San Francisco, CA (hybrid). The posting calls out experience with Python, Machine Learning, Distributed Systems, Observability. Compensation is listed at $205,000–$445,000 per year.

Role
ML Platform Engineer
Function
machine learning
Level
mid
Track
Individual contributor
Employment
Full-time
Location
San Francisco, CA
Work mode
Hybrid
Department
Scaling
Posted
Oct 29, 2025
AI Summary
Optimize training throughput for OpenAI's internal ML framework while enabling researcher experimentation. Profile and optimize distributed training systems across massive GPU clusters. Requires strong software engineering in Python, deep systems knowledge, and passion for performance optimization.

More roles at OpenAI

Research Engineer, Applied AI Engineering
San Francisco, CA · mid
PyTorch LLMs Deep Learning
Software Engineer, Financial Engineering
San Francisco, CA · mid
API Development Performance Optimization OpenAI
Security Engineer, Application Security
San Francisco, CA · mid
Python Java Security
Software Engineer, Real Time
Seattle, WA · mid
Machine Learning Python Kubernetes
Software Engineer, Integrity Foundations
San Francisco, CA · mid
Python Azure Kubernetes
All OpenAI jobs →

Job description

from OpenAI careers

About the Team

Training Runtime designs the core distributed machine-learning training runtime that powers everything from early research experiments to frontier-scale model runs. With a dual mandate to accelerate researchers and enable frontier scale, we’re building a unified, modular runtime that meets researchers where they are and moves with them up the scaling curve.

Our work focuses on three pillars: high-performance, asynchronous, zero-copy tensor and optimizer-state-aware data movement; performant, high-uptime, fault-tolerant training frameworks (training loop, state management, resilient checkpointing, deterministic orchestration, and observability); and distributed process management for long-lived, job-specific and user-provided processes.

We integrate proven large-scale capabilities into a composable, developer-facing runtime so teams can iterate quickly and run reliably at any scale, partnering closely with model-stack, research, and platform teams. Success for us is measured by raising both training throughput (how fast models train) and researcher throughput (how fast ideas become experiments and products).

About the Role

As a Training: ML Framework Engineer, you will work on improving the training throughput for our internal training framework, while enabling researchers to experiment with new ideas.  This requires good engineering (for example designing, implementing, and optimizing state-of-the-art AI models), writing bug-free machine learning code (surprisingly difficult!), and acquiring deep knowledge of the performance of supercomputers. In all the projects this role pursues, the ultimate goal is to push the field forward.

This is an excerpt. Read the full job description on OpenAI careers →
All machine learning jobs machine learning in San Francisco, CA Jobs in San Francisco, CA machine learning salaries machine learning career path
All OpenAI Jobs Browse machine learning roles mid positions