Multi-Agent Prompt Injection

Multi-agent prompt injection — also known as second-hand prompt injection or agent-to-agent prompt injection — is a class of indirect prompt injection that emerges once LLM-driven agents collaborate. An attacker plants instructions inside content an upstream agent will summarize, paraphrase, or relay. By the time those instructions reach a planner or tool-using agent downstream, their attacker provenance has been laundered into ordinary-looking peer-agent text, and the planner acts on them under the operator's credentials.

The contrast with single-agent indirect injection is the laundering hop: an intermediate agent rewrites attacker text into peer-agent text, so audit logs record the resulting tool calls without the provenance step that would have flagged them. See what is multi-agent prompt injection for the full mental model.

Multi-Agent Prompt Injection

Multi-Agent Prompt Injection

See also

Derived From

Related Work