Memory Poisoning

Memory poisoning is a prompt-injection variant in which an attacker causes malicious instructions or false facts to be written into an agent's persistent memory or shared retrieval store, so the payload is replayed to the agent — or to peer agents — on a later turn as trusted context. The distinguishing property versus generic indirect prompt injection is persistence and later reactivation: the compromise fires after the original untrusted source has left the conversation, often in a different session or against an agent that never saw the source. Provenance is typically lost when summarization writes derived content into memory, which is what makes the recalled payload look trustworthy.

The temporal gap is what makes this class hard to debug: recurrences appear after the apparent fix because the poisoned record outlives the patched turn. See multi-agent prompt injection for the broader taxonomy.

Memory Poisoning

Memory Poisoning

See also

Derived From

Related Work