Agent Token Burn Explained

Desk note

Agent spend is different because the model is not called once. It plans, reads, calls tools, retries, summarizes, and sometimes loops. The trace is the unit of accountability.

Agents multiply calls

A normal prompt may be one request and one response. An agent may plan, inspect files, call tools, revise, retry, and summarize. Each step adds token cost and can carry previous context forward.

Count calls per task, not only tokens per call.
Preserve step order so the trace is reviewable.

ReceiptsArs Technica

On this siteTokenmaxxing examples

Context grows over time

When agents carry too much history or irrelevant file context, every new step becomes more expensive before the model writes a useful answer. Context hygiene matters more as the trace gets longer.

Summarize or prune trace state deliberately.
Retrieve files by task instead of loading broad directories.

Retries are hidden spend

A failed edit, malformed tool call, ambiguous instruction, or flaky API can trigger repeated attempts. From the outside it looks like progress; inside the trace it is often a cost leak.

Alert on repeated tool errors.
Cap retries and require a new plan after failure.

Controls that work

The useful controls are concrete: task budgets, step limits, model routing, trace review, evals, and human-in-the-loop checkpoints for high-risk work. A controlled agent should be able to explain why it continued, why it stopped, and what output was accepted.

Use cheaper models for low-risk subtasks only after evals.
Report accepted task rate alongside spend.

Weekly briefing

The term is moving faster than the definition.

Tokenmaxxing keeps shifting as new receipts land. The weekly briefing tracks who's burning what, and why it matters.

Agent Token Burn Explained

Agents multiply calls

Context grows over time

Retries are hidden spend

Controls that work

The term is moving faster than the definition.

Current feed records connected to this guide

Introducing Claude Sonnet 5

Why Token Optimization Is a Gift to the Hyperscalers

‘What we’re seeing right now is just rapid escalation in AI token spend’: Accenture tells staff to stop using AI for unnecessary tasks amid surging costs

Tools that make the guide operational

LangGraph

LiteLLM

promptfoo

Agent Token Burn Explained

Agents multiply calls

Context grows over time

Retries are hidden spend

Controls that work

The term is moving faster than the definition.

Current feed records connected to this guide

Introducing Claude Sonnet 5

Why Token Optimization Is a Gift to the Hyperscalers

&lsquo;What we&rsquo;re seeing right now is just rapid escalation in AI token spend&rsquo;: Accenture tells staff to stop using AI for unnecessary tasks amid surging costs

Tools that make the guide operational

LangGraph

LiteLLM

promptfoo

Fresh source notes each week.

‘What we’re seeing right now is just rapid escalation in AI token spend’: Accenture tells staff to stop using AI for unnecessary tasks amid surging costs