senior machine learning Research Scientist ic 4+ yrs Bachelor's Remote · Posted Mar 13, 2026

Skills

Python TypeScript LLMs Git Machine Learning A/B Testing AI Agents

About this role

GitHub is hiring a senior-level Research Scientist in the machine learning function as a remote position. The posting calls out experience with Python, TypeScript, LLMs, Git and roughly 4+ years of relevant work. Listed education preference: a bachelor's degree or equivalent.

Role: Research Scientist
Function: machine learning
Level: senior
Track: Individual contributor
Location: United States
Work mode: Remote
Experience: 4+ years
Education: Bachelor's degree
Department: Engineering
Posted: Mar 13, 2026

AI Summary

Design and build next-generation LLM evaluation frameworks for code generation, reasoning, and agentic workflows at GitHub Copilot. Lead model quality strategy, develop scalable metrics and benchmarking systems, and mentor researchers. Requires deep LLM evaluation expertise, strong engineering instincts, and 4-8+ years data science experience depending on degree level.

Upgrade to Pro for AI summaries, resume match scores & career intelligence →

More roles at GitHub

Senior Software Engineer

United States · senior

Python JavaScript Java

Principal Software Engineering Manager

United States · manager

Python JavaScript Java

Principal Software Engineering Manager

United Kingdom · manager

Python JavaScript Java

Senior Software Engineer

United States · senior

Python JavaScript Java

Staff Software Engineer

United States · staff

Python JavaScript TypeScript All GitHub jobs →

Job description

from GitHub careers

About GitHub

GitHub is the world’s leading platform for agentic software development — powered by Copilot to build, scale, and deliver secure software. Over 180 million developers, including more than 90% of the Fortune 100 companies, use GitHub to collaborate, and more than 77,000 organisations have adopted GitHub Copilot.

Locations

In this role you can work from Remote, United States

Overview

At GitHub, we’re building the next generation of AI‑powered developer experiences. We’re looking for a Staff Applied Researcher with deep expertise in Large Language Model (LLM) evaluation, LLM agents, strong engineering instincts, and a bias for action to help shape the future of GitHub Copilot and our AI platform.

This is a high‑impact role where you will design evaluation systems that directly influence how millions of developers experience AI every day.

Responsibilities

Lead Model Quality & Evaluation

Design next‑generation evaluation frameworks for code generation, reasoning, safety, multimodal tasks, and agentic workflows.
Develop scalable automatic metrics, LLM‑judge systems, reward models, and human‑in‑the‑loop evaluation pipelines.
Establish high‑signal, repeatable methodologies that influence product decisions across GitHub AI.

Drive Applied Research & Engineering

Build and optimize evaluation tooling, datasets, benchmarking systems, and experimentation pipelines.
Create and onboard new benchmarks for the hardest tasks for the coding agents.
This is an excerpt. Read the full job description on GitHub careers →

All machine learning jobs machine learning salaries machine learning career path

All GitHub Jobs Browse machine learning roles senior positions

Staff Applied Researcher, AI Quality

About this role

More roles at GitHub

Job description

Drive Applied Research & Engineering