Why it matters
It is a named-company playbook for cutting AI cost ~50% without rationing engineers: 91% had never bumped into the prior usage caps, so swapping defaults beat restricting access.
Tokenmaxxing read
Caching was the biggest lever, lifting cache hits from 5% to 60%; defaulting to open-weight Chinese models also squeezes Western labs on price right as several eye public listings. Fix the gateway before you ration people.
Source takeaway
Armstrong’s own account, relayed by The Decoder; the levers are directional, not audited line items, and the headline “half” covers spend while token volume still grows — keep those two separate when modeling savings.
