Research Engineer, Model Evaluations
Anthropic · Remote | San Francisco, CA | New York City, NY · AI Research & Engineering
About this role
Anthropic is hiring a mid-level Research Scientist in the machine learning function as a remote position. The posting calls out experience with Python, LLMs, Reinforcement Learning, Distributed Systems. Compensation is listed at $320,000–$485,000 per year.
- Role
- Research Scientist
- Function
- machine learning
- Level
- mid
- Track
- Individual contributor
- Employment
- Full-time
- Location
- Remote | San Francisco, CA | New York City, NY
- Work mode
- Remote
- Department
- AI Research & Engineering
More roles at Anthropic
Job description
from Anthropic careersAbout Anthropic
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.
About the role
We're looking for Research Engineers to build the evaluations that tell us — and the world — what Claude can actually do. Your work will turn ambiguous notions of "intelligence" into clear, defensible metrics that researchers, leadership, and the public can rely on.
You'll design and implement evaluations across the full spectrum of Claude's capabilities and personality, and build the infrastructure that runs them reliably at scale. You'll partner closely with researchers throughout the lifecycle of a new capability — from defining what to measure, to running the eval against live training checkpoints, to interpreting the results. The goal is to make Anthropic the leader in extremely well-characterized AI systems, with performance that is exhaustively measured and validated across the tasks that matter.
Key responsibilities
- Design and run new evaluations of Claude's capabilities — reasoning, agentic behavior, knowledge, safety properties — and produce visualizations that make the results legible to researchers and decision-makers