news

Coinbase halves its AI bill with cheaper defaults, routing, and caching

Coinbase CEO Brian Armstrong says five levers — cheaper model defaults (GLM 5.2, Kimi 2.7), task routing, caching, lean context, and spend visibility — cut the company’s AI bill roughly in half despite rising token volume.

Published 2026-06-29Source: The Decoder

Why it matters

It is a named-company playbook for cutting AI cost ~50% without rationing engineers: 91% had never bumped into the prior usage caps, so swapping defaults beat restricting access.

Tokenmaxxing read

Caching was the biggest lever, lifting cache hits from 5% to 60%; defaulting to open-weight Chinese models also squeezes Western labs on price right as several eye public listings. Fix the gateway before you ration people.

Source takeaway

Armstrong’s own account, relayed by The Decoder; the levers are directional, not audited line items, and the headline “half” covers spend while token volume still grows — keep those two separate when modeling savings.

Topic links

tokenmaxxingcost-governancetopic model-routingtopic ai-spendtopic

Related projects

Tools that match this angle

#1Direct

Routing

LiteLLM

BerriAI/litellm

An OpenAI-compatible gateway and SDK for calling many model providers with budgets, logging, load balancing, guardrails, and cost tracking.

52K9.3KSource-available

gatewaycost-trackingrouting

Project profile GitHub

#2Direct

Observability

Langfuse

langfuse/langfuse

Open-source LLM engineering platform for observability, traces, metrics, evals, prompt management, datasets, and playground workflows.

30K3.1KSource-available

tracesevalscosts

Project profile GitHub

#10Direct

Routing

Portkey Gateway

Portkey-AI/gateway

An AI gateway for routing across LLMs with guardrails, provider abstraction, and an OpenAI-compatible API surface.

12.2K1.2KMIT

gatewayguardrailsrouting

Project profile GitHub

Related feed

More source-linked context

newsTN

news2026-05-27medium review

“Tokenmaxxing is real, expensive & it’s spreading”: AI budgets are exploding - The New Stack

AI accountability startup Lanai debuted Token Tuner, a beta that scores each employee's efficiency by matching token usage and model choice to task complexity — peers burned 10x the tokens for half the efficiency in one beta.

ai-spendcost-governanceexplainer

Read note

newsAC

news2026-05-02

Introducing Augment Prism: model routing to reduce cost and maintain quality

Augment Code introduces Prism, a cache-aware model router for coding-agent sessions that chooses an underlying model per user turn to reduce token spend without materially degrading output quality (per Augment’s benchmarks).

tokenmaxxingcost-governancemodel-routing

Read note

newsIB

news2026-02-19

Bunq adopts Orq.ai router amid Europe AI sovereignty push - IT Brief UK

IT Brief UK reports bunq replaced in-house LLM routing with Orq.ai’s router, citing rising maintenance costs and gaps in observability, governance, and performance.

tokenmaxxingcost-governanceai-spend

Read note