Operations Engineer, HPC Networking
CoreWeave · Livingston, NJ | New York City, NY | Sunnyvale, CA | Bellevue, WA · Technology
About this role
CoreWeave is hiring a mid-level Systems Engineer in the operations function based in Livingston, NJ | New York City, NY | Sunnyvale, CA | Bellevue, WA. The posting calls out experience with Python, C, Bash, Spring. Compensation is listed at $90,000–$110,000 per year.
- Role
- Systems Engineer
- Function
- operations
- Level
- mid
- Track
- Individual contributor
- Employment
- Full-time
- Location
- Livingston, NJ | New York City, NY | Sunnyvale, CA | Bellevue, WA
- Department
- Technology
More roles at CoreWeave
Job description
from CoreWeave careersAbout the Role
At CoreWeave we are seeking a dedicated and detail-oriented Operations Engineer to join our HPC Networking Team. HPC Networking at CoreWeave is tasked with developing and operating some of the largest InfiniBand fabrics, powering industry leading AI workloads.
What You’ll Do
In this role, you will support the deployment, monitoring, troubleshooting, and maintenance of large-scale InfiniBand fabrics, ensuring their stability and performance. The ideal candidate will have a strong operations mindset, effective collaboration skills, and the ability to solve complex issues in a dynamic environment.
- Regularly monitor the performance and health of InfiniBand fabrics, including switches, host adapters, and nodes.
- Investigate and resolve operational issues within InfiniBand fabrics, such as network connectivity problems and performance bottlenecks.
- Assist with the installation and operational bring-up of large InfiniBand fabrics in collaboration with onsite personnel and customer teams.