mid Software Engineer ic · Posted Nov 7, 2024
$230,000 – $490,000
USD per year

About this role

OpenAI is hiring a mid-level Software Engineer based in San Francisco, CA. The posting calls out experience with Kubernetes, Linux, Distributed Systems, Embedded Systems. Compensation is listed at $230,000–$490,000 per year.

Role
Software Engineer
Function
software engineering
Level
mid
Track
Individual contributor
Employment
Full-time
Location
San Francisco, CA
Department
Scaling
Posted
Nov 7, 2024
AI Summary
Operate and scale massive Kubernetes clusters for frontier model training at hyperscale datacenters. Build automation for bare-metal provisioning, cluster lifecycle management, and infrastructure abstractions. Requires deep Kubernetes experience, strong programming skills (Python/Go), and comfort with bare-metal Linux and GPU hardware in high-availability environments.

More roles at OpenAI

Community Engagement Lead - Stargate
Remote (United States) · senior
OpenAI
Software Engineer, Model Inference
San Francisco, CA · mid
Azure CUDA PyTorch
Research Engineer / Research Scientist, Post-Training
San Francisco, CA · mid
Machine Learning OpenAI Reinforcement Learning
Software Engineer, Compute - Storage
San Francisco, CA · mid
Rust Kubernetes Terraform
Solutions Engineer, Pre-Sales
San Francisco, CA · mid
Python JavaScript LLMs
All OpenAI jobs →

Job description

from OpenAI careers

About the Team

The Frontier Systems team at OpenAI builds, launches, and supports the largest supercomputers in the world that OpenAI uses for its most cutting edge model training.

We take data center designs, turn them into real, working systems and build any software needed for running large-scale frontier model trainings.

Our mission is to bring up, stabilize and keep these hyperscale supercomputers reliable and efficient during the training of the frontier models.

About the Role

We are looking for engineers to operate the next generation of compute clusters that power OpenAI’s frontier research.

This role blends distributed systems engineering with hands-on infrastructure work on our largest datacenters. You will scale Kubernetes clusters to massive scale, automate bare-metal bring-up, and build the software layer that hides the complexity of a magnitude of nodes across multiple data centers.

You will work at the intersection of hardware and software, where speed and reliability are critical. Expect to manage fast-moving operations, quickly diagnose and fix issues when things are on fire, and continuously raise the bar for automation and uptime.

In this role, you will:

  • Spin up and scale large Kubernetes clusters, including automation for provisioning, bootstrapping, and cluster lifecycle management

  • Build software abstractions that unify multiple clusters and present a seamless interface to training workloads

    This is an excerpt. Read the full job description on OpenAI careers →
All software engineering jobs software engineering in San Francisco, CA Jobs in San Francisco, CA software engineering salaries software engineering career path
All OpenAI Jobs Browse software engineering roles mid positions