Solver–Tutor Gap

The solver–tutor gap is the divergence between a language model's ability to solve a domain problem and its ability to teach a learner to solve it. Formalized by Macina et al. in MathTutorBench (EMNLP 2025), it names a structural fact about pedagogical evaluation: subject competence — final-answer correctness on benchmarks like MATH or GSM8K — does not entail pedagogical competence, which requires identifying a learner's mistake, locating where it occurred, scaffolding the next move, withholding answers, and sustaining coherent multi-turn dialogue. MathTutorBench reports that solving and teaching skill can even trade off depending on how a tutor is specialized.

TutorBench (Srinivasa et al., 2025) finds no frontier LLM exceeding 56% on rubric-based tutoring criteria despite strong subject performance — a concrete instance of why teams selecting tutor models on solving benchmarks alone systematically over-pick models prone to premature answer-giving and weak diagnosis.

Solver–Tutor Gap

Solver–Tutor Gap

See also

Derived From

External References