Executive Summary

An answer that clears the output guardrail is not proof that no personal data leaked while the agent was producing it. Beyond the final answer, an agent spills data through tool-call arguments, system logs, and messages passed between agents. This article looks at why an audit cannot stop at the final answer.

AgentLeak, a benchmark released in 2026, reported that an audit inspecting only the answer misses 41.7% of privacy violations. Inter-agent messages leak personal data at a rate of 68.8%, while the final output leaks only 27.2%. These figures come from controlled benchmark scenarios, so read them as values a study reported rather than as settled fact.

The dividing line isn't where data is stored, but which channel it can be observed through. The sections below walk those channels one at a time.

41.7%

of violations an output audit misses

C2 68.8% − C1 27.2%

68.8%

inter-agent message leak rate

2.5× the 27.2% of final output

~85%

peak leak via tool input & logs

even when the output is sanitized

2.6×

internal channels vs external

internal 74% vs external 28.2%

1

The Illusion of a Clean Answer

For many teams, the privacy audit hangs on a single point: the final answer the model hands back to the user. They attach an output filter, check whether personal data slipped into the answer, and call it safe once it passes. The approach is intuitive, and it seems to match the picture most regulatory documents draw.

The trouble is that an agent is not a thing that produces one answer and nothing else. While handling a single task, an agent calls tools, writes intermediate results to logs, and sends messages to other agents. Every one of those flows is a channel that data passes through. The final answer is only the last slot in the sequence.

AgentLeak, a benchmark released in 2026, put a number on this blind spot. It instrumented seven communication channels and measured how much personal data leaked at each one separately. The final output leaked 27.2%, but inter-agent messages leaked 68.8%. The 41.7-point gap between those two figures is exactly the size of the violations that an output-only audit never sees. The answer passed; the channel failed.

Agent execution path — the audit sees only the last step, C1 Agent Reasoning · Planning · Execution Internal channels (outside audit) C2 Inter-agent messages · C3 Tool input C5 Shared memory · C6 System logs · C7 68.8% – 85% leak Final output (C1) 27.2% leak ← what the audit sees Audit scope → C1 only ← audit blind spot +41.7 pt → Orange = audit scope (C1). Dashed gray = internal channels outside the audit. An agent leaks through more than one channel. (AgentLeak 7-channel basis)
▲ Agent execution path and audit scope — internal channels flow only inside the pipeline and are never caught by an output audit — original Pebblous diagram | Source: El Yagoubi et al., arXiv:2602.11510
2

The Invisible Channels an Agent Leaks Through

AgentLeak split the places an agent can leak data into seven and instrumented each one: final output (C1), inter-agent messages (C2), tool input (C3), tool output (C4), shared memory (C5), system logs (C6), and artifacts (C7). Of these, the one an audit actually looks at is usually just C1. The other six flow only inside the pipeline, so they rarely draw attention.

This instrumentation was not a small experiment. AgentLeak ran 1,000 scenarios spanning healthcare, finance, legal, and enterprise domains across five commercial and open models, mixed in thirty-two attack types, and left 4,979 verified execution traces. Every per-channel leak rate below is drawn from that scale.

The channel that leaks the most is not the final output. Inter-agent messages lead at 68.8%, and tool input and system logs leaked as much as 85% depending on the scenario. The cause looks closer to a design habit. An agent treats tool arguments like scratch space, pushing more raw data into functions than it needs to. When sensitive information passes through the reasoning trace, its residue stays in the logs. Sanitize only the final answer, and these channels remain outside the audit entirely.

Per-channel privacy leak rate — the audit sees only C1 (AgentLeak) 25% 50% 75% C1 Final output 27.2% ← what the audit sees +41.7pt audit blind spot C2 Inter-agent messages 68.8% C3 Tool input ~85% C6 System logs ~85% Orange = channel the audit sees (C1); gray = internal channels outside the audit. Controlled benchmark scenarios.
▲ Audit only the final output (C1) and you miss 41.7 points of violations. The channels that actually leak more are inter-agent messages, tool input, and system logs — original Pebblous diagram | Source: El Yagoubi et al., arXiv:2602.11510

Internal channels leak about 2.6 times more than external output (74% internal vs. 28.2% external). No matter how clean you scrub the answer, if the channels the data traveled through stay open, personal data slips out between them.

3

The Multi-Agent Paradox — Safer Output, Riskier System

The most counterintuitive part comes from multi-agent setups. When a single model answered alone, its final-output leak was 43.2%. Split the same work across several collaborating agents and the final-output leak actually dropped to 27.2%. Read the dashboard alone and multi-agent looks like the safer choice.

Add up all the internal channels, though, and total system exposure jumps to 68.9%. The more you split agents apart, the more channels open up between them — messages and shared memory — while the audited final output grows relatively quiet. The violations did not disappear; they moved inward, where the audit does not look.

The multi-agent paradox — output falls, total system rises Single-agent · final output 43.2% Multi-agent · final output 27.2% looks safer Multi-agent · total system 68.9% actually riskier Sum the internal channels and exposure reverses. Output audits do not catch this rise (AgentLeak reported values).
▲ Moving to multi-agent lowers the final-output leak from 43.2% to 27.2%, but adding the internal channels pushes total system exposure up to 68.9% — original Pebblous diagram | Source: El Yagoubi et al., arXiv:2602.11510

The paradox sharpens when you break it down by model. One commercial model had the lowest final-output leak of the five at 8.2%, yet the highest internal-to-external leak ratio at 6.6×. Its tool input and system logs leaked as much as 85%. In other words, the model with the cleanest output was leaking the most on the inside. Rank models by output leak rate alone and you read their safety exactly backward.

From an attacker's point of view, these inner channels are the target. Among the attack families AgentLeak tested, the highest success rate belonged to multi-agent coordination attacks, which broke through 82.9% of the time. They exploited the premise that agents trust one another, turning the channels used for coordination into leak paths and using shared memory as a place to pool data. Splitting agents apart widens more than the audit blind spot; it widens the attack surface too.

4

The Channel, Not Storage, Is the Audit Unit

Another study, released independently around the same time, offers a frame for why this happens. The Observable Channels work argues that privacy risk should be seen as a property of observable channels rather than of the component where data is stored. What matters is not which database holds the data, but which channel it can be observed from on the way out. Where AgentLeak measured how much leaks, this study explains why it leaks.

Storage location vs observable channel — the audit-unit paradigm shift (Observable Channels) Storage-centric audit DB Audit question: Which storage holds the data? Limit: processing channels stay invisible vs Channel-centric audit C2 messages C3 tool input C6 system logs Audit question: Which channel can expose it? Channel = risk unit = intervention unit Risk is set not by where data is stored but by which channel it can be observed through (arXiv:2603.22751)
▲ From storage-centric to channel-centric — if the channel is the unit of risk, it is also the unit of intervention — original Pebblous diagram | Source: Huang et al., arXiv:2603.22751

Seen through the lens of channels, the character of the risk differs from one channel to the next. Data pulled in by retrieval leaks often but incompletely. A leak that passes through a tool depends heavily on what that tool is allowed to see. The wider a tool's observation surface, the more there is to leak. What each channel is allowed to observe is what sets the shape of the leak.

This view also carries a practical hope. The study reported that while some high-risk channels, such as memory systems, leak almost constantly, there are points where simply sanitizing a weak control sharply suppresses the leak. If the channel is the unit of risk, it is also the unit of intervention. Know what each channel is open to observe, and you can narrow that opening.

Regulation points the same way. GDPR's data minimization and purpose limitation, and the EU AI Act's logging obligations, all ask not what the data finally output, but where it flowed across the whole processing chain. The 41.7% leaking through tool calls, system logs, and inter-agent messages is regulatory risk that never reaches the audit log. With the EU AI Act's high-risk obligations set to apply in earnest in August 2026, an audit regime that watches only the output stands exactly where it is easiest to mistake itself for compliant.

5

AI-Ready Means the Whole Path Data Touches Is Visible

Reaching it by different roads, the two studies converge on one conclusion: data governance cannot stop at the final answer. The answer is only the tail end of the many channels the data traveled through, and the real risk piles up in the channels before it. Widening the audit unit from the final output to the entire path is the only way to close the 41.7% blind spot.

The AI-Ready Data that Pebblous talks about goes beyond data being cleaned and labeled. AI-Ready also requires that the whole data path the model touches can be observed and audited. The 41.7% blind spot is exactly the symptom of missing that path visibility. If you cannot see what rode into the tool input, what sensitive data was left in the logs, or what passed between agents, then that data — however clean it looks — is not yet AI-Ready.

The diagnostic view is the same. Data where the surface artifact passes but the pipeline fails once you open it is common. This article's shape — pass on the output, fail on the channel — is the agent-era version of an old principle: data quality has to be judged across the whole path, not at the final result. The problem came from a new technology, but the question is not unfamiliar. Can we see where our data flows?

R

References

Academic

Open Source

Pebblous