Agent spend is different because the model is not called once. It plans, reads, calls tools, retries, summarizes, and sometimes loops. The trace is the unit of accountability.
Agents multiply calls
A normal prompt may be one request and one response. An agent may plan, inspect files, call tools, revise, retry, and summarize. Each step adds token cost and can carry previous context forward.
- Count calls per task, not only tokens per call.
- Preserve step order so the trace is reviewable.
Context grows over time
When agents carry too much history or irrelevant file context, every new step becomes more expensive before the model writes a useful answer. Context hygiene matters more as the trace gets longer.
- Summarize or prune trace state deliberately.
- Retrieve files by task instead of loading broad directories.
Retries are hidden spend
A failed edit, malformed tool call, ambiguous instruction, or flaky API can trigger repeated attempts. From the outside it looks like progress; inside the trace it is often a cost leak.
- Alert on repeated tool errors.
- Cap retries and require a new plan after failure.
Controls that work
The useful controls are concrete: task budgets, step limits, model routing, trace review, evals, and human-in-the-loop checkpoints for high-risk work. A controlled agent should be able to explain why it continued, why it stopped, and what output was accepted.
- Use cheaper models for low-risk subtasks only after evals.
- Report accepted task rate alongside spend.