Pedagogical Safety

Pedagogical safety is the property of a generative AI tutor that protects learners from educational harms — failures that degrade learning even when the tutor is fluent, polite, and factually correct. Representative harms include answer over-disclosure, misconception reinforcement, cognitive offloading, false mastery, and multi-turn pedagogical drift. The construct is operationalized by the SafeTutors benchmark (arXiv:2603.17373), which evaluates tutors against a taxonomy of educational harms and shows that multi-turn dialogue exposes failures single-turn evals miss.

Pedagogical safety is distinct from content safety and jailbreak resistance. A model that passes red-team content evals can still teach badly: handing a polished worked solution to a learner who has not yet attempted the problem is content-safe and pedagogically unsafe.

Pedagogical Safety

Pedagogical Safety

See also

Derived From

Related Work

External References