Why it matters
This is a concrete example of token and latency optimizations degrading agent reliability. Teams should treat effort defaults, context-pruning logic, and prompt edits as production controls that can change both output quality and effective spend.
Tokenmaxxing read
Token efficiency is not just about using fewer tokens. If an optimization causes forgetfulness or weaker reasoning, the savings come back as retries and rework. Track tokens per successful task, cache misses, and regressions after harness or prompt changes.
Source takeaway
Anthropic’s April 23, 2026 postmortem says the API and inference layer were unaffected; the problems came from Claude Code product defaults and context handling, and Anthropic reset subscriber usage limits after shipping fixes.


