news

Coinbase halves its AI bill with cheaper defaults, routing, and caching

Coinbase CEO Brian Armstrong says five levers — cheaper model defaults (GLM 5.2, Kimi 2.7), task routing, caching, lean context, and spend visibility — cut the company’s AI bill roughly in half despite rising token volume.

Published 2026-06-29Source: The Decoder
Generated Tokenmaxxing editorial thumbnail for Coinbase halves its AI bill with cheaper defaults, routing, and caching

Why it matters

It is a named-company playbook for cutting AI cost ~50% without rationing engineers: 91% had never bumped into the prior usage caps, so swapping defaults beat restricting access.

Tokenmaxxing read

Caching was the biggest lever, lifting cache hits from 5% to 60%; defaulting to open-weight Chinese models also squeezes Western labs on price right as several eye public listings. Fix the gateway before you ration people.

Source takeaway

Armstrong’s own account, relayed by The Decoder; the levers are directional, not audited line items, and the headline “half” covers spend while token volume still grows — keep those two separate when modeling savings.

Topic links

Related projects

Tools that match this angle

#1Direct
Routing

LiteLLM

BerriAI/litellm

An OpenAI-compatible gateway and SDK for calling many model providers with budgets, logging, load balancing, guardrails, and cost tracking.

52K9.3KSource-available
gatewaycost-trackingrouting
#2Direct
Observability

Langfuse

langfuse/langfuse

Open-source LLM engineering platform for observability, traces, metrics, evals, prompt management, datasets, and playground workflows.

30K3.1KSource-available
tracesevalscosts
#10Direct
Routing

Portkey Gateway

Portkey-AI/gateway

An AI gateway for routing across LLMs with guardrails, provider abstraction, and an OpenAI-compatible API surface.

12.2K1.2KMIT
gatewayguardrailsrouting
Related feed

More source-linked context

Generated Tokenmaxxing editorial thumbnail for “Tokenmaxxing is real, expensive & it’s spreading”: AI budgets are exploding - The New Stack
newsTN
newsmedium review

“Tokenmaxxing is real, expensive & it’s spreading”: AI budgets are exploding - The New Stack

AI accountability startup Lanai debuted Token Tuner, a beta that scores each employee's efficiency by matching token usage and model choice to task complexity — peers burned 10x the tokens for half the efficiency in one beta.

ai-spendcost-governanceexplainer
Read note
Augment Code source artwork
newsAC
news

Introducing Augment Prism: model routing to reduce cost and maintain quality

Augment Code introduces Prism, a cache-aware model router for coding-agent sessions that chooses an underlying model per user turn to reduce token spend without materially degrading output quality (per Augment’s benchmarks).

tokenmaxxingcost-governancemodel-routing
Read note
Generated Tokenmaxxing editorial thumbnail for Bunq adopts Orq.ai router amid Europe AI sovereignty push - IT Brief UK
newsIB
news

Bunq adopts Orq.ai router amid Europe AI sovereignty push - IT Brief UK

IT Brief UK reports bunq replaced in-house LLM routing with Orq.ai’s router, citing rising maintenance costs and gaps in observability, governance, and performance.

tokenmaxxingcost-governanceai-spend
Read note