Glitch Token
Quick Answer
A glitch token is a vocabulary entry whose presence in a prompt disproportionately triggers anomalous model output — incoherence, unexplained refusals, truncation, loops, or silent corruption. The usual root cause is under-training: tokens that exist in the tokenizer's vocabulary but appear rarely or never in the pretraining corpus receive few gradient updates, leaving their embeddings in a poorly-conditioned region of representation space. Also known as under-trained tokens or magikarp tokens.
Glitch Token
A glitch token is a vocabulary entry that, when present in a prompt, disproportionately triggers anomalous model output — incoherent text, unexplained refusals, truncations, infinite loops, or silent data corruption. The most common root cause is under-training: the tokenizer is trained on one corpus and the model on another, so tokens that exist in the vocabulary but appear rarely or never during pretraining receive few meaningful gradient updates, and their embeddings end up in a poorly-conditioned region of representation space. The term was sharpened in Land and Bartolo's "Fishing for Magikarp" (EMNLP 2024), which also gave the field the alias "magikarp token."
The mental model is short: tokenization is part of the model, and a token the model never properly learned is a hole in its input space. For the longer treatment — failure modes, detection, and why these tokens distort safety evaluations — see what are glitch tokens.
See also
- what are glitch tokens — full explainer covering failure modes and detection
- Glitcher — gradient-based discovery on Llama-family tokenizers
- Glitcher 2 — full-vocabulary census and ASR validation