Solver–Tutor Gap
Quick Answer
The solver–tutor gap is the empirically observed divergence between a language model's ability to solve a domain problem and its ability to teach a learner to solve it. Formalized by Macina et al. in MathTutorBench (EMNLP 2025), the term names the structural fact that subject competence — final-answer correctness on benchmarks like MATH or GSM8K — does not entail pedagogical competence such as diagnosing mistakes, withholding answers, or scaffolding the next move.
Solver–Tutor Gap
The solver–tutor gap is the divergence between a language model's ability to solve a domain problem and its ability to teach a learner to solve it. Formalized by Macina et al. in MathTutorBench (EMNLP 2025), it names a structural fact about pedagogical evaluation: subject competence — final-answer correctness on benchmarks like MATH or GSM8K — does not entail pedagogical competence, which requires identifying a learner's mistake, locating where it occurred, scaffolding the next move, withholding answers, and sustaining coherent multi-turn dialogue. MathTutorBench reports that solving and teaching skill can even trade off depending on how a tutor is specialized.
TutorBench (Srinivasa et al., 2025) finds no frontier LLM exceeding 56% on rubric-based tutoring criteria despite strong subject performance — a concrete instance of why teams selecting tutor models on solving benchmarks alone systematically over-pick models prone to premature answer-giving and weak diagnosis.
See also
- Generative AI tutors and personalized adaptive learning systems — the source paper framing the solver–tutor gap as the organizing failure mode for tutor evaluation.