Routing

Portkey Gateway for tokenmaxxing

Model routing plus guardrails is the grown-up version of tokenmaxxing: pick the right route, then keep the call inside policy.

12.3K starsPortkey-AI/gateway

1.2K forksGitHub metadata checked 2026-07-07

MITDirect tokenmaxxing fit

What it does

An AI gateway for routing across LLMs with guardrails, provider abstraction, and an OpenAI-compatible API surface.

Why it belongs here

Model routing plus guardrails is the grown-up version of tokenmaxxing: pick the right route, then keep the call inside policy.

Best use case

Teams that want an OpenAI-compatible gateway with routing, provider abstraction, guardrails, and operational policy controls.

How to use it

Centralize model calls, define guardrails and fallbacks, and compare provider cost and latency across the same workflows.

Limits

Guardrails and routes need real policies. A gateway cannot decide quality targets or risk tolerance for the team.

Source notes connected to this use case

newsW

news2026-06-30

Meituan open-sources LongCat-2.0 — the 1.6T model that topped OpenRouter as Owl Alpha

WinBuzzer: Meituan opened LongCat-2.0, a 1.6-trillion-parameter MoE coding model (~48B active per token, 1M-token context) that surfaced atop OpenRouter as the unbranded alias Owl Alpha — MIT-licensed, with weights not yet posted.

tokenmaxxingmodel-routermodel-routing

Read note

newsU

news2026-06-29medium review

Why Token Optimization Is a Gift to the Hyperscalers

UncoverAlpha's Rihard Jarc argues the pivot from tokenmaxxing to token optimization — routing cheap work to cheaper models — won't shrink AI bills. It multiplies token volume, and the hyperscalers renting the compute collect either way.

tokenmaxxingmodel-routerai-spend

Read note

newsTD

news2026-06-29

Coinbase halves its AI bill with cheaper defaults, routing, and caching

Coinbase CEO Brian Armstrong says five levers — cheaper model defaults (GLM 5.2, Kimi 2.7), task routing, caching, lean context, and spend visibility — cut the company’s AI bill roughly in half despite rising token volume.

tokenmaxxingcost-governancemodel-routing

Read note

newsA

news2026-06-09

Claude Fable 5 and Claude Mythos 5 - Anthropic

Anthropic shipped Claude Fable 5 (GA, with classifier safeguards) and Claude Mythos 5 (safeguards lifted, vetted partners only) on June 9 — $10 per million input tokens, $50 per million output, under half the Mythos Preview price.

agentscoding-agentspricing

Read note

Alternatives

More routing projects

#1Direct

Routing

LiteLLM

BerriAI/litellm

An OpenAI-compatible gateway and SDK for calling many model providers with budgets, logging, load balancing, guardrails, and cost tracking.

52.8K9.5KSource-available

gatewaycost-trackingrouting

Project profile GitHub

#2Direct

Observability

Langfuse

langfuse/langfuse

Open-source LLM engineering platform for observability, traces, metrics, evals, prompt management, datasets, and playground workflows.

30.6K3.2KSource-available

tracesevalscosts

Project profile GitHub

#13In spirit

Structured output

Outlines

dottxt-ai/outlines

A structured-output toolkit for constraining generation with formats like JSON, regex, and grammars.

14.4K758Apache-2.0

jsonconstrained-generationretries

Project profile GitHub

Portkey Gateway for tokenmaxxing

What it does

Why it belongs here

Best use case

How to use it

Limits

Tags

Source notes connected to this use case

Meituan open-sources LongCat-2.0 — the 1.6T model that topped OpenRouter as Owl Alpha

Why Token Optimization Is a Gift to the Hyperscalers

Coinbase halves its AI bill with cheaper defaults, routing, and caching

Claude Fable 5 and Claude Mythos 5 - Anthropic

More routing projects

LiteLLM

Langfuse

Outlines