What Is a Compound AI System? Why the System Boundary Is the Unit of Analysis
Quick Answer
A compound AI system is an automation system in which a foundation model is one component among retrievers, tools, memory, planners, validators, sandboxes, and policy engines that together complete a multi-step objective. Performance and risk are produced by the system boundary, not by a single model call. The model is an untrusted probabilistic planner; the runtime around it is the unit of trust, the unit of authorization, and the unit of accountability.
What Is a Compound AI System? Why the System Boundary Is the Unit of Analysis
A compound AI system is an automation system in which a foundation model is one component inside a larger program. Retrievers, tools, memory stores, planners, validators, sandboxes, and policy engines collaborate to complete a multi-step objective. The point of the term is to move the unit of analysis — and the unit of trust — from the model call to the system boundary. If you are building, buying, or securing multi-step LLM automation, this framing is the difference between defensible architecture and prompt-only theater.
What is a compound AI system?
The term was popularized by Berkeley's BAIR group in 2024 to describe a real shift in how state-of-the-art AI results are produced: not by a single bigger model, but by programs that combine models with retrieval, tools, structured state, and control flow. A compound AI system is that program.
Five properties show up in almost every instance:
- Task decomposition. A user objective is broken into subgoals, steps, tool calls, or graph nodes.
- External grounding. The system retrieves data from files, databases, web pages, APIs, logs, or source code.
- Action. The system mutates state — writes files, sends messages, executes code, calls APIs, opens pull requests, approves transactions.
- Feedback loops. Intermediate outputs are observed, and the plan is adapted through retries, reflection, or replanning.
- Persistent state. Memory stores carry preferences, task history, retrieved documents, scratchpads, and tool traces across steps and sessions.
The mental model: the model is not the workflow. The workflow is code. The model is one node within it. See the compound AI system glossary entry for the canonical short definition.
How does it work? Eight orchestration patterns
Compound AI systems differ mostly in how they wire the model to everything else. The patterns the source paper maps, with their security profile in plain language:
- Linear pipeline. Fixed sequence such as ingest, retrieve, extract, reason, validate, write. Easy to observe and constrain. Best when the workflow is known in advance.
- Router or classifier. Maps input to a specialized flow. Routing is often more valuable than free-form planning. Misrouting becomes a security bug when branches have different privileges.
- ReAct loop. Interleaves reasoning and tool calls. Flexible, but the least secure common pattern: every iteration re-exposes the model to untrusted data and gives it another chance to call a tool.
- Planner-executor. High-level decomposition is separated from low-level execution. The plan is inspectable, but planning that sees untrusted context can still be hijacked.
- Graph or DAG workflow. Typed state passed between nodes with declared tool and data permissions. The recommended default for high-stakes automation. LangGraph-style runtimes implement this shape.
- Multi-agent handoff. Role-specific agents — researcher, coder, reviewer. Useful only when roles map to real privilege boundaries or independent evaluation; otherwise it is theater that multiplies attack surface.
- Search, voting, or debate. Generate candidates and select. Strong when the scoring function is external — tests, policy, schema. Weak when one LLM judges another LLM.
- Compiler-optimized LM program. Orchestration as code with metrics, dependency-scheduled tool calls, and static analysis of the workflow. DSPy and LLMCompiler are reference points. This is where versioning and testing of the system itself become possible.
The headline capability numbers from public benchmarks make the gap concrete. WebArena reports roughly 14.4% web-agent success against ~78.2% human on 812 realistic web tasks. OSWorld reports ~12.2% best-model success against ~72.4% human on 369 desktop tasks. SWE-bench's 2,294 real GitHub issues show agent progress fastest where executable feedback exists. AgentDojo reports indirect prompt injection in tool responses reaching up to ~70% average attack success against GPT-4o. The gap between demos and reliable automation is a system-engineering gap, not a model gap.
Why does it matter?
A compound AI system reads untrusted context, plans over it, calls real tools, mutates external state, and may propagate compromised state into later steps. Practical consequences:
- The model's context window becomes an ambient authority channel. Anything that lands in it can shape the next tool call.
- Retrieved documents become latent instructions, which is the basis of indirect prompt injection.
- Tool schemas become capability advertisements. Weak brokers around them produce tool hijacking.
- Agent memory becomes a persistence layer for compromise.
- Orchestration graphs become privilege-escalation paths whenever a low-trust step can write into a high-trust step's input.
Who is affected: any organization deploying multi-step automation — coding assistants that touch repositories, customer-ops agents that issue refunds, security agents that enrich alerts, finance agents that handle invoices, browser agents that submit forms. Worst plausible outcomes scale with the authority granted: data exfiltration through the retrieval channel, unauthorized state mutation through tool abuse, persistent compromise through memory poisoning, and economic denial-of-service through runaway tool and model calls.
Reliability compounds the problem. A 95%-reliable 10-step workflow is roughly 60% reliable end-to-end, and worse under correlated errors. Capability gaps and security gaps both manifest at the system level, not the model level.
This page is published at yellow risk: architectural detail is included; working prompt-injection payloads, step-by-step reproductions of AgentDojo or PoisonedRAG, and exploit chains against named production agents are intentionally withheld. Practitioners doing adversarial evaluation should work from the source paper and the AgentDojo methodology directly.
How do you defend against it?
The position from the source paper: agents need less magic and more systems engineering. The defensible architecture is a governed runtime around untrusted intelligence.
-
Treat the system boundary as the unit of trust, not the model. Models propose, extract, rank, summarize, and draft. The runtime authorizes, validates, executes, and records. The cost is real workflow code instead of prompt-only orchestration. This is the core idea behind agent capability control.
-
Enforce least agency per step. For each step, define the minimum data scope, tool allowlist, and permitted side effects, and expire permissions after the step. The cost is more workflow definition. It does not cover novel tool-chaining attacks where individually-scoped tools compose into escalation.
-
Treat all context as tainted by default. Label every input — system, developer, user, tool-output-trusted, tool-output-untrusted, retrieved-public, retrieved-internal, agent-generated — and restrict actions on plans derived from tainted context. The cost is taint plumbing through the orchestrator. It does not cover model-internal blending of instructions and data.
-
Prefer deterministic control flow. Use LLMs only where semantic flexibility is required. Encode workflows, policies, thresholds, and approvals as code. The cost is less "agent magic." It does not cover the semantic steps that genuinely need a model.
-
Validate before acting; stage drafts and commits. Every state-changing action passes validators proportional to impact: schema, policy, independent evidence, human approval. Mutating workflows have dry-run and commit phases. The cost is latency. It does not cover validators that share the actor model's blind spots.
-
Use structured handoffs, not transcript dumps. Between steps, agents, or branches, pass typed artifacts with provenance and trust labels. The cost is schema discipline. It does not eliminate prompt injection if the structured fields themselves carry attacker text.
-
Trace and replay everything. Capture prompts, tool calls, arguments, outputs with taint, policy decisions, validator results, and approvals. Without replay, incident response on agents is guesswork.
A practical hardening sequence for the runtime around the model is laid out in the tool-using agent hardening checklist.
Related concepts and tools
- Compound AI system (glossary) — short canonical definition this page expands on.
- Excessive agency (glossary) — the failure mode the architecture exists to prevent.
- Ambient authority (glossary) — why the context window is an authority channel.
- Lethal trifecta (glossary) — the risk frame for systems combining untrusted input, tools, and private data.
- Indirect prompt injection (learn) — the canonical context-channel attack against compound systems.
FAQ
What is a compound AI system?
A compound AI system is an automation system in which a foundation model is one component among retrievers, tools, memory stores, planners, validators, and policy engines that together complete a multi-step objective. The Berkeley BAIR formulation popularized the term in 2024. The point is to move the unit of analysis from the model call to the system boundary.
How is a compound AI system different from an LLM agent?
Agent is a usage pattern: a model that observes, plans, and acts in a loop. Compound AI system is the architectural framing for the program around that model. The compound framing forces you to treat the system boundary, not the model call, as the unit of analysis and the unit of trust, which is what capability control and incident response actually require.
What are the main orchestration patterns in compound AI systems?
The common patterns are linear pipelines, routers, ReAct loops, planner-executor, graph or DAG workflows, multi-agent handoff, search/voting/debate, and compiler-optimized LM programs. Each has a different capability and security profile. ReAct is the most flexible and the least secure; typed graph workflows are the recommended default for high-stakes automation.
Why does the system boundary matter more than the model?
Because performance, failure modes, and authority all live at the system level. The model is an untrusted probabilistic component that proposes text and tool calls. The runtime is what authorizes, executes, mutates state, and is accountable. Treating the model as the system is how excessive agency, prompt injection, and tool abuse become production incidents.
Derived From
Related Work
External References
FAQ
What is a compound AI system?
A compound AI system is an automation system in which a foundation model is one component among retrievers, tools, memory stores, planners, validators, and policy engines that together complete a multi-step objective. The Berkeley BAIR formulation popularized the term in 2024. The point is to move the unit of analysis from the model call to the system boundary.
How is a compound AI system different from an LLM agent?
Agent is a usage pattern: a model that observes, plans, and acts in a loop. Compound AI system is the architectural framing for the program around that model. The compound framing forces you to treat the system boundary, not the model call, as the unit of analysis and the unit of trust, which is what capability control and incident response actually require.
What are the main orchestration patterns in compound AI systems?
The common patterns are linear pipelines, routers, ReAct loops, planner-executor, graph or DAG workflows, multi-agent handoff, search/voting/debate, and compiler-optimized LM programs. Each has a different capability and security profile. ReAct is the most flexible and the least secure; typed graph workflows are the recommended default for high-stakes automation.
Why does the system boundary matter more than the model?
Because performance, failure modes, and authority all live at the system level. The model is an untrusted probabilistic component that proposes text and tool calls. The runtime is what authorizes, executes, mutates state, and is accountable. Treating the model as the system is how excessive agency, prompt injection, and tool abuse become production incidents.