senior software engineering Site Reliability Engineer ic · Posted May 8, 2026
$267,000 – $401,000
USD per year

About this role

Lambda is hiring a senior-level Site Reliability Engineer in the software engineering function based in San Francisco Office. The posting calls out experience with Terraform, Ansible, Serverless, Prometheus. Compensation is listed at $267,000–$401,000 per year.

Role
Site Reliability Engineer
Function
software engineering
Level
senior
Track
Individual contributor
Employment
Full-time
Location
San Francisco Office
Department
Data Center Business
Posted
May 8, 2026

More roles at Lambda

Procurement & Operations Lead
San Jose Office · senior
Serverless Machine Learning Networking
Data Center Operations Systems Engineer (Los Angeles, CA)
Vernon, CA · mid
Serverless Jira Linux
Staff Storage Engineer
San Francisco Office · staff
Kubernetes Terraform Ansible
Senior Technical Program Manager - Product Engineering
San Francisco Office · senior
Serverless Machine Learning Cloud Computing
Technical Accounting Lead
San Jose Office · senior
Serverless Machine Learning SaaS
All Lambda jobs →

Job description

from Lambda careers

Lambda, The Superintelligence Cloud, is a leader in AI cloud infrastructure serving tens of thousands of customers. Our customers range from AI researchers to enterprises and hyperscalers. Lambda's mission is to make compute as ubiquitous as electricity and give everyone the power of superintelligence. One person, one GPU.

If you'd like to build the world's best AI cloud, join us.

*Note: This position requires presence in our San Francisco, San Jose, or Bellevue WA office location 4 days per week; Lambda’s designated work from home day is currently Tuesday.

Engineering at Lambda is responsible for building and scaling our cloud offering. Our scope includes the Lambda website, cloud APIs and systems as well as internal tooling for system deployment, management and maintenance.

What You’ll Do

  • Deploy and operate observability platforms for logging, metrics, and distributed tracing.

  • Automate the deployment and operation of these observability systems.

  • Set up monitoring for modern AI/HPC cluster infrastructure.

  • Develop platform software to make observability adoptable and improve product reliability.

  • Lead members of other engineering teams in development of solutions for their monitoring challenges.

You

  • Have 8+ years of experience in software engineering, with 3+ years in Go

  • Have 5+ years of experience in Site Reliability Engineering practices

    This is an excerpt. Read the full job description on Lambda careers →
All software engineering jobs software engineering salaries software engineering career path
All Lambda Jobs Browse software engineering roles senior positions