Glitch Token

A glitch token is a vocabulary entry that, when present in a prompt, disproportionately triggers anomalous model output — incoherent text, unexplained refusals, truncations, infinite loops, or silent data corruption. The most common root cause is under-training: the tokenizer is trained on one corpus and the model on another, so tokens that exist in the vocabulary but appear rarely or never during pretraining receive few meaningful gradient updates, and their embeddings end up in a poorly-conditioned region of representation space. The term was sharpened in Land and Bartolo's "Fishing for Magikarp" (EMNLP 2024), which also gave the field the alias "magikarp token."

The mental model is short: tokenization is part of the model, and a token the model never properly learned is a hole in its input space. For the longer treatment — failure modes, detection, and why these tokens distort safety evaluations — see what are glitch tokens.

Glitch Token

Glitch Token

See also

Derived From

Related Work