Software Engineer, Frontier Systems - Power Management
OpenAI · San Francisco, CA · Scaling
About this role
OpenAI is hiring a mid-level Systems Engineer in the operations function based in San Francisco, CA. The posting calls out experience with Python, SQL, Bash, pandas and roughly 7+ years of relevant work. Compensation is listed at $295,000–$445,000 per year.
- Role
- Systems Engineer
- Function
- operations
- Level
- mid
- Track
- Individual contributor
- Employment
- Full-time
- Location
- San Francisco, CA
- Experience
- 7+ years
- Department
- Scaling
- Posted
- Oct 31, 2024
More roles at OpenAI
Job description
from OpenAI careersAbout the Team
The Frontier Systems team at OpenAI builds, launches, and supports the largest supercomputers in the world that OpenAI uses for its most cutting edge model training.
We take data center designs, turn them into real, working systems and build any software needed for running large-scale frontier model trainings.
Our mission is to bring up, stabilize and keep these hyperscale supercomputers reliable and efficient during the training of the frontier models.
About the Role
As a Software Engineer on the Frontier Systems team focused on power management, you will work on critical infrastructure to support cutting-edge research. With large-scale supercomputers consuming substantial amounts of power, managing this efficiently is key to maximizing computational capacity. This role is critical to ensuring that our cutting-edge research supercomputing infrastructure runs smoothly, while maintaining reliability and grid-level power stability.
Our team empowers strong engineers with a high degree of autonomy and ownership, as well as ability to effect change. This role will require a keen focus on system-level comprehensive investigations and the development of automated solutions. We want people who go deep on problems, investigate as thoroughly as possible, and build automation for detection and remediation at scale.
In this role, you will: