What Compound AI Systems Mean for Security and Engineering Leaders
Quick Answer
Agentic AI deployments are compound systems — orchestrated programs in which a language model plans, retrieves, and calls real tools that mutate real state. Public benchmarks through 2025 show these systems solve only 12–15% of realistic tasks while suffering up to 70% attack success from content embedded in their inputs. Compound AI systems fail and get exploited at the system boundary, not the model call, so governance has to constrain agency, mediate tools, and treat retrieved content as untrusted data.
Key Takeaway
Compound AI systems fail and get exploited at the system boundary, not the model call, so governance has to constrain agency, mediate tools, and treat retrieved content as untrusted data.
Agentic AI is being sponsored on a cost case faster than the reliability and security case has been made. The systems being deployed are not chatbots — they are compound programs in which a model plans over retrieved content, calls real tools, and mutates real state. Public benchmarks through 2025 show these systems solve only 12–15% of realistic web and desktop tasks while suffering up to 70% attack success when adversarial content is embedded in tool responses. Compound AI systems fail and get exploited at the system boundary, not the model call, so governance has to constrain agency, mediate tools, and treat retrieved content as untrusted data.
What this means for your organization
When an agent reads untrusted content — a customer email, a support ticket, a web page, a retrieved document, the response from a tool it just called — and can also call tools that mutate state, it has the structural ingredients for unauthorized action under adversarial input. AgentDojo measured up to 70% attack success against GPT-4o from indirect prompt injection in tool responses. PoisonedRAG reached 90% attack success after planting five malicious documents in a corpus of millions.
Translated to business outcomes: revenue leakage from wrong actions (incorrect refunds, mis-routed payments, unauthorized account changes), data exfiltration through retrieved or tool-returned content, audit exposure when actions cannot be replayed for a regulator, and brand exposure when an agent does something publicly visible and wrong. OWASP's 2025 LLM Top 10 names prompt injection and excessive agency as the dominant application risks for exactly these reasons.
The mental model lives in what is a compound AI system; technical depth is held at the architectural level here. The shift for a non-technical sponsor is straightforward: stop evaluating "the model" and start evaluating the system around the model.
What to ask your team
Which of our agentic workflows can both read untrusted content and call tools that mutate state, and where are those two paths separated?
For each agent in production, what is the smallest set of tools, data scopes, and spend it actually needs, and what expires those permissions at step boundaries?
When an agent takes an irreversible action, which deterministic policy authorizes it, and is that policy code or a prompt?
If an agent does something wrong tomorrow, can we replay the exact sequence of prompts, tool calls, arguments, and approvals for an incident responder?
What good looks like
Models are not principals. Credentials live with tools mediated by a broker, not with the model. Every action is attributable to a human or service principal that can be audited, suspended, or revoked.
Least agency, not just least privilege. Each step gets the minimum tools, data scopes, step budget, recipients, and spend required for that step. Permissions expire at step boundaries rather than persisting for the lifetime of the agent.
Tool outputs are data, not instructions. Untrusted content carries trust labels through the system. Planning and execution contexts are separated. Actions derived from untrusted content require explicit approval before they execute against the world.
Deterministic policy gates irreversible actions. Refund limits, allowed payment recipients, allowed SQL operations, and separation-of-duties rules are code the orchestrator must call before a mutating tool runs. Models recommend; deterministic systems authorize.
Every run is replayable. Prompts, tool calls, arguments, policy decisions, retrieved documents, and human approvals are logged in a form an incident responder can reconstruct. Implementation specifics live in the linked agent capability control checklist.
Where to dig deeper
- What is a compound AI system — orchestration patterns and the mental model this brief defers to
- What is agent capability control — the architectural framing for "models are not principals"
- What is indirect prompt injection — context for the AgentDojo 70% figure
- Tool-using agent hardening checklist — implementation work for narrowing tools and validating outputs
- Multi-agent hardening brief — sibling brief on multi-agent systems specifically
- Source paper — full benchmark data and orchestration-pattern survey
FAQ
How exposed are we if we already have agentic automation in production?
Exposure is high if any agent reads untrusted content (email, tickets, web pages, retrieved documents) and can call tools that mutate state without a deterministic policy gate in front of them. Map every workflow to a least-agency model and treat anything outside that as priority remediation. The architectural posture in this brief is the framing; the implementation work is in the linked checklists.
Are these benchmark numbers a reason to pause our AI program?
No. They are a reason to scope autonomy. Compound systems work well when bounded by deterministic policy, narrow tools, and human gates on irreversible actions. The pause should be on unconstrained agents — single agents holding broad credentials and acting on untrusted input — not on the program itself.
Is this regulated yet?
Direct AI-agent regulation is still emerging, but existing controls already apply: data residency, segregation of duties, audit trails, and sector-specific rules all bind agentic systems just as they bind any other software that takes action against your environment. OWASP's 2025 LLM Top 10 is currently the most concrete public reference for application-layer risks.
What's the single biggest mistake teams are making right now?
Building omnipotent assistants — one agent with read access to everything and write access to mutating tools, instructed by a prompt rather than gated by code. The fix is architectural: split the reader, planner, and executor into separate components and mediate every tool call through a broker that enforces deterministic policy.
Derived From
Related Work
External References
FAQ
How exposed are we if we already have agentic automation in production?
Exposure is high if any agent reads untrusted content (email, tickets, web pages, retrieved documents) and can call tools that mutate state without a deterministic policy gate in front of them. Map every workflow to a least-agency model and treat anything outside that as priority remediation. The architectural posture in this brief is the framing; the implementation work is in the linked checklists.
Are these benchmark numbers a reason to pause our AI program?
No. They are a reason to scope autonomy. Compound systems work well when bounded by deterministic policy, narrow tools, and human gates on irreversible actions. The pause should be on unconstrained agents — single agents holding broad credentials and acting on untrusted input — not on the program itself.
Is this regulated yet?
Direct AI-agent regulation is still emerging, but existing controls already apply: data residency, segregation of duties, audit trails, and sector-specific rules all bind agentic systems just as they bind any other software that takes action against your environment. OWASP's 2025 LLM Top 10 is currently the most concrete public reference for application-layer risks.
What's the single biggest mistake teams are making right now?
Building omnipotent assistants — one agent with read access to everything and write access to mutating tools, instructed by a prompt rather than gated by code. The fix is architectural: split the reader, planner, and executor into separate components and mediate every tool call through a broker that enforces deterministic policy.