Pedagogical Safety
Quick Answer
Pedagogical safety is the property of a tutoring system that protects learners from educational harms — answer over-disclosure, misconception reinforcement, cognitive offloading, false mastery, and multi-turn drift — rather than from offensive or unsafe content. A tutor can be content-safe and pedagogically unsafe at the same time: a correct, well-worded answer delivered before the learner has attempted the work is a pedagogical safety failure.
Pedagogical Safety
Pedagogical safety is the property of a generative AI tutor that protects learners from educational harms — failures that degrade learning even when the tutor is fluent, polite, and factually correct. Representative harms include answer over-disclosure, misconception reinforcement, cognitive offloading, false mastery, and multi-turn pedagogical drift. The construct is operationalized by the SafeTutors benchmark (arXiv:2603.17373), which evaluates tutors against a taxonomy of educational harms and shows that multi-turn dialogue exposes failures single-turn evals miss.
Pedagogical safety is distinct from content safety and jailbreak resistance. A model that passes red-team content evals can still teach badly: handing a polished worked solution to a learner who has not yet attempted the problem is content-safe and pedagogically unsafe.
See also
- Cognitive offloading — the most-cited harm pedagogical safety guards against
- Performance–learning gap — assisted performance up, unassisted performance down
- Solver–tutor gap — why a strong solver is not automatically a safe tutor