Cheap Agents, Expensive Bills

Executive Summary

Anthropic unveiled Claude Sonnet 5 on June 30. At its introductory rate, it runs $2 per million input tokens and $10 per million output. That is 40–60% cheaper than the flagship Opus 4.8, while closing much of the gap in agentic coding. The headlines converged on one line: "You can now run your agents for less." But that headline is only half right. The other half is what decides the bill.

Over the same stretch, the blended token price fell 67% in a year — yet enterprise AI bills rose 320%. At the root of that paradox is the agent loop, and what makes the loop run longer is weak data. Fully 85% of failed agent tasks trace back to data quality problems.

So the axis of competition has moved. It is no longer the token price; it is the cost per completed task. A cheap model only lowers the starting line. When the data is weak, the agent spins its wheels, and the cost of reaching the finish line grows instead.

Key Numbers

Read separately, the four numbers below are just statistics. Read left to right, they become a single chain of cause. The price clearly fell (−67%), yet the bill swelled anyway (+320%); what filled the gap was the lengthening agent loop (~50×); and most of what stretched that loop was weak data (85%).

Sources: TechCrunch, EY, Gartner, KPMG (2025–2026)

−67%

Blended token price

$18.40 → $6.07 (YoY)

+320%

Enterprise bill

Avg. budget rise, same period

~50×

Cost of a 10-turn session

vs. single call, context piles up

85%

Cause of agent failure

Root cause is data quality

1

The Price Fell — So Why Is This News?

Sonnet 5's price tag is genuinely attractive. The introductory rate, in effect through August 31, is $2 per million input tokens and $10 per million output. Standard pricing rises to $3/$15 in September, but even then it stays below Opus 4.8 ($5/$25). It is also unusual that Anthropic led with an introductory rate at all. A company spokesperson said they "want customers to test at the lowest cost against their real workloads."

But the real news isn't the numbers. On this announcement, TechCrunch was blunt: agentic capability is now "table stakes," and the contest has shifted to "how cheaply, and how reliably without human intervention, a system can carry a task all the way through." It reads less like a price cut and more like a declaration — the era of showing off capability is over, and the economics of completion has begun.

So reading this announcement as merely "a price-cut story" misses half of it. Anthropic playing the introductory-rate card, and the industry starting to talk about "cost per completion" instead of benchmarks, are the same signal: the axis of the contest is moving from token price to task completion.

2

Price Down 67%, Bill Up 320%

Over the past year, the blended token price fell from $18.40 to $6.07 per million tokens — a 67% drop, measured across 2.4 billion API calls. By common sense, the bill should have shrunk too. Yet over the same period, enterprise AI budgets climbed from an average of $1.2 million to roughly $7 million — about 320% higher. The price is falling while spending rises. Where is it leaking?

2.1First, the price cut itself may be an illusion

Sonnet 5 uses a new tokenizer. That means it splits the same text into roughly 30% more tokens than before. Even with the sticker price unchanged, effective cost can run 10–35% higher on certain workloads. Developer Simon Willison observed token counts inflating by up to 1.46× on one system prompt. The number on the price tag and the number on the bill are, in effect, two different languages.

2.2The real bill comes from the loop

An agent isn't called once and done. Every turn, it resends the entire conversation so far. So a 10-turn session doesn't cost 10× a single call — it costs about 50×. Cost grows as a quadratic function of conversation length. Break down a busy coding agent's day and you find that 99% of the tokens processed are input tokens re-reading the accumulated trajectory; actual generation is just 1%.

▲ Agent loop costs grow quadratically — a 10-turn session costs ~50× a single call | Pebblous original diagram

And the model invoice is only 20–40% of total cost. The other 60–80% goes to orchestration, retrieval, retries, and observability. EY reported that a simple workflow interaction costing 4 cents in 2023 jumped to $1.20 (about 30×) in a 2026 orchestration system with tools, reasoning, and iterative loops.

This isn't theory. Uber's AI budget was burned through in four months as coding-assistant usage climbed from 33% to the low 80s. Microsoft clawed back Claude Code licenses it had deployed, and Priceline's renewal quote jumped 4–5× year over year. A cheap unit price actually explodes usage and inflates the total.

3

The New Math of Cost per Task

Token price is now just one variable among many that explain cost. The metric to actually watch is cost per completed task: when a single job is carried all the way through without human intervention, what did it cost in total? That figure decomposes roughly like this.

Cost per completed task ≈ (token price × loop length) ÷ success rate

Of the three variables, the only one you can lower by swapping models is the first — token price. Loop length and success rate are governed not by the model catalog but by the data the agent stands on. And those two dominate the bill.

Take the success rate. Early-deployed agents complete autonomously about 50% of the time; even mature systems land at 70–80%. When completion is low, the denominator shrinks and cost per completion soars. Worse, failed attempts don't simply vanish. If you attempt 10,000 jobs and only 7,000 finish unattended, the resources burned by the 3,000 failures are loaded squarely onto the unit cost of the 7,000 that succeeded.

Loop length works the same way. Every time verification fails, the agent reloads the full context and retries. A cycle that corrects itself 10 times burns 50× what a first-pass run would. In the end, even if you switch to a cheaper model and cut the unit price by 40%, if the loop doubles in length and the success rate is halved, the cost per completion grows instead.

4

Data Readiness Decides the Bill

So what is the most upstream variable that sets loop length and success rate? Data quality. At every interaction, the agent reads its situation from data, chooses an action, and judges its next step. When that data is weak, judgment is contaminated, and hallucination, drift, and unpredictable behavior spawn retry loops. Longer loops mean a bigger bill.

▲ Weak data quality → retry loops → cost per task explodes | Pebblous original diagram

The numbers back this chain of cause. 85% of failed AI tasks are pinned on data quality as the root cause. Meanwhile, only 12% of organizations have data quality good enough to support AI applications (Gartner, 2025). Over two quarters in which agent adoption quadrupled from 11% to 42%, concern over data quality shot up from 56% to 82% (KPMG). As agents multiplied, so did the bill for weak data.

Gartner pushed the point one step further into prediction: 60% of AI projects not backed by AI-Ready data will be scrapped by 2026, and more than 40% of agentic AI projects will be canceled by 2027. The warning is plain — no matter how cheap the model gets, a project whose data isn't ready won't reach the finish line.

This price war summarizes in one sentence: a cheap model only lowers the starting line; the cost of reaching the finish line is decided by the data. While Western labs quietly nudge effective prices up through tokenizers and tier adjustments, the teams that actually hold their budgets aren't the ones that picked a cheaper model — they're the ones that first firmed up the data their agents stand on.

Sonnet 5 is a good tool. But for a cheaper tool to actually save you money, the ground it stands on has to be solid. On the new axis of cost per completed task, the real lever sits below the choice of model — in data readiness.

R

References