Routing

LiteLLM for tokenmaxxing

The most direct tokenmaxxing fit: route calls, track spend, enforce budgets, and stop pretending every prompt deserves the priciest model.

52.8K starsBerriAI/litellm

9.5K forksGitHub metadata checked 2026-07-07

Source-availableDirect tokenmaxxing fit

What it does

An OpenAI-compatible gateway and SDK for calling many model providers with budgets, logging, load balancing, guardrails, and cost tracking.

Why it belongs here

The most direct tokenmaxxing fit: route calls, track spend, enforce budgets, and stop pretending every prompt deserves the priciest model.

Best use case

Teams that want one gateway for provider abstraction, model routing, usage logging, budgets, fallbacks, and cost-aware defaults.

How to use it

Put it between the app and model providers, tag requests by workflow, set spend limits, and route low-risk tasks to cheaper models after evals pass.

Limits

A gateway will not fix vague prompts or poor review loops by itself. Budget rules need ownership and ongoing tuning.

Source notes connected to this use case

newsW

news2026-06-30

Meituan open-sources LongCat-2.0 — the 1.6T model that topped OpenRouter as Owl Alpha

WinBuzzer: Meituan opened LongCat-2.0, a 1.6-trillion-parameter MoE coding model (~48B active per token, 1M-token context) that surfaced atop OpenRouter as the unbranded alias Owl Alpha — MIT-licensed, with weights not yet posted.

tokenmaxxingmodel-routermodel-routing

Read note

newsU

news2026-06-29medium review

Why Token Optimization Is a Gift to the Hyperscalers

UncoverAlpha's Rihard Jarc argues the pivot from tokenmaxxing to token optimization — routing cheap work to cheaper models — won't shrink AI bills. It multiplies token volume, and the hyperscalers renting the compute collect either way.

tokenmaxxingmodel-routerai-spend

Read note

newsTD

news2026-06-29

Coinbase halves its AI bill with cheaper defaults, routing, and caching

Coinbase CEO Brian Armstrong says five levers — cheaper model defaults (GLM 5.2, Kimi 2.7), task routing, caching, lean context, and spend visibility — cut the company’s AI bill roughly in half despite rising token volume.

tokenmaxxingcost-governancemodel-routing

Read note

newsA

news2026-06-09

Claude Fable 5 and Claude Mythos 5 - Anthropic

Anthropic shipped Claude Fable 5 (GA, with classifier safeguards) and Claude Mythos 5 (safeguards lifted, vetted partners only) on June 9 — $10 per million input tokens, $50 per million output, under half the Mythos Preview price.

agentscoding-agentspricing

Read note

Alternatives

More routing projects

#10Direct

Routing

Portkey Gateway

Portkey-AI/gateway

An AI gateway for routing across LLMs with guardrails, provider abstraction, and an OpenAI-compatible API surface.

12.3K1.2KMIT

gatewayguardrailsrouting

Project profile GitHub

#2Direct

Observability

Langfuse

langfuse/langfuse

Open-source LLM engineering platform for observability, traces, metrics, evals, prompt management, datasets, and playground workflows.

30.6K3.2KSource-available

tracesevalscosts

Project profile GitHub

#11Direct

Observability

Helicone

Helicone/helicone

Open-source LLM observability for monitoring, evaluation, experimentation, latency, requests, and usage behavior.

5.9K622Apache-2.0

observabilityexperimentsusage

Project profile GitHub

LiteLLM for tokenmaxxing

What it does

Why it belongs here

Best use case

How to use it

Limits

Tags

Source notes connected to this use case

Meituan open-sources LongCat-2.0 — the 1.6T model that topped OpenRouter as Owl Alpha

Why Token Optimization Is a Gift to the Hyperscalers

Coinbase halves its AI bill with cheaper defaults, routing, and caching

Claude Fable 5 and Claude Mythos 5 - Anthropic

More routing projects

Portkey Gateway

Langfuse

Helicone