Annotation Data Scientist, Evaluation Integrity (Siri)
Apple · Cambridge, MA · Machine Learning and AI
About this role
Apple is hiring a mid-level Data Annotator in the operations function based in Cambridge, MA. The posting calls out experience with Python, SQL, Spark, pandas.
- Role
- Data Annotator
- Function
- operations
- Level
- mid
- Track
- Individual contributor
- Location
- Cambridge, MA
- Department
- Machine Learning and AI
- Posted
- May 19, 2026
More roles at Apple
Job description
from Apple careersPlay a part in the ongoing revolution in human-computer interaction. Siri is evolving — and the way we evaluate it has to evolve with it. Join the Evaluation Integrity team to help build the trusted quality signal behind every Siri release.
Within the Siri evaluation organization, the Human Evaluation sub-team is responsible for answering the question: can we trust our evals? We do that by designing human-in-the-loop (HITL) annotation tasks that scrutinize every moving part of an agentic evaluation — the simulated user agent, the conversation it has with Siri, and the automated evaluators that grade the exchange. This role sits at the intersection of data science, human annotation engineering, and evaluation methodology, and is instrumental in turning human judgment into a rigorous, reproducible signal that directly informs pre-ship model and product decisions.
As an Annotation Data Scientist on the Evaluation Integrity team, you will design and run HITL annotation projects that evaluate the quality and authenticity of agentic user personae, the validity of agent-to-agent conversations, and the reliability of LLM-as-judge and rule-based evaluators against Siri's product specifications. You will own annotation initiatives end-to-end; from rubric design and tooling, through annotator calibration, to data science analysis that turns annotator judgments into actionable signal for modeling, planning, and product teams.