Staff Applied Researcher, AI Quality
GitHub · United States · Engineering
About this role
GitHub is hiring a senior-level Research Scientist in the machine learning function as a remote position. The posting calls out experience with Python, TypeScript, LLMs, Git and roughly 4+ years of relevant work. Listed education preference: a bachelor's degree or equivalent.
- Role
- Research Scientist
- Function
- machine learning
- Level
- senior
- Track
- Individual contributor
- Location
- United States
- Work mode
- Remote
- Experience
- 4+ years
- Education
- Bachelor's degree
- Department
- Engineering
- Posted
- Mar 13, 2026
More roles at GitHub
Job description
from GitHub careersGitHub is the world’s leading platform for agentic software development — powered by Copilot to build, scale, and deliver secure software. Over 180 million developers, including more than 90% of the Fortune 100 companies, use GitHub to collaborate, and more than 77,000 organisations have adopted GitHub Copilot.
Locations
In this role you can work from Remote, United States
Overview
At GitHub, we’re building the next generation of AI‑powered developer experiences. We’re looking for a Staff Applied Researcher with deep expertise in Large Language Model (LLM) evaluation, LLM agents, strong engineering instincts, and a bias for action to help shape the future of GitHub Copilot and our AI platform.
This is a high‑impact role where you will design evaluation systems that directly influence how millions of developers experience AI every day.
Responsibilities
Lead Model Quality & Evaluation
Design next‑generation evaluation frameworks for code generation, reasoning, safety, multimodal tasks, and agentic workflows.
Develop scalable automatic metrics, LLM‑judge systems, reward models, and human‑in‑the‑loop evaluation pipelines.
Establish high‑signal, repeatable methodologies that influence product decisions across GitHub AI.
Drive Applied Research & Engineering
Build and optimize evaluation tooling, datasets, benchmarking systems, and experimentation pipelines.
Create and onboard new benchmarks for the hardest tasks for the coding agents.
This is an excerpt. Read the full job description on GitHub careers →