Read your token receipts right: volume and cost are different leaderboards
Attribute your model usage by both tokens and dollars, so you can see the flip the OpenRouter rankings show: cheap open models dominate volume while premium models dominate spend, and never mistake a high token ranking for value.
Run this workflow
CI-verified, 2/2 fixtures passing.
Build this with your agent
One copy-paste hands Claude Code, Codex, or Cursor the full recipe, steps included, nothing to fetch.
Intended Use
Anyone running agents at volume who wants to read their spend honestly. CI validates a receipts report: each model has a tier (cheap_open or premium), a token count, and a unit price, then computes each tier's share of volume and share of cost and asserts the flip (premium's cost-share exceeds its volume-share; cheap open dominates volume). No key, no model call. Your actual routing decisions are fenced.
Not for
- Reading tokens as quality or spend, a high token ranking says cheap model in a busy loop, not best model; this recipe exists to keep that straight
- Trusting a single week, weekly snapshots are volatile and a free preview model can spike then vanish; attribute over a real period
- Assuming the receipts are complete, usage on direct provider keys or private deployments is invisible to a gateway's rankings
The Stack
Tested Against
OpenRouter public rankings (June 2026, shape)node@20Side effects & data flow
- Network
- none, local only
- Writes
- ./receipts.json
- Credentials
- none required
Prerequisites
- Your own usage export (tokens per model) and each model's unit price
Steps
- 1
Attribute usage by tokens and cost, and check the flip
List each model with its tier, its token count, and its output price per million tokens. CI computes each tier's share of volume and share of cost and confirms the flip: cheap open dominates volume, premium dominates cost. Deciding what to route where is fenced.
cat > receipts.json <<'JSON' { "period": "2026-06", "models": [ { "name": "DeepSeek V4 Flash", "tier": "cheap_open", "tokens": 167000000000, "usd_per_mtok": 0.28 }, { "name": "GLM 5.2", "tier": "cheap_open", "tokens": 50200000000, "usd_per_mtok": 3.00 }, { "name": "Claude Opus 4.8", "tier": "premium", "tokens": 14800000000, "usd_per_mtok": 25.00 } ] } JSON node -e ' const fs = require("fs"); const c = JSON.parse(fs.readFileSync("receipts.json", "utf8")); function bad(m) { console.error("BAD: " + m); process.exit(1); } const models = c.models || []; if (models.length < 2) bad("need at least two models to compare"); let totTok = 0, totCost = 0; const tierTok = {}, tierCost = {}; for (const m of models) { if (!m.tier || m.tokens == null || m.usd_per_mtok == null) bad("each model needs tier, tokens, usd_per_mtok"); const cost = (m.tokens / 1e6) * m.usd_per_mtok; totTok += m.tokens; totCost += cost; tierTok[m.tier] = (tierTok[m.tier] || 0) + m.tokens; tierCost[m.tier] = (tierCost[m.tier] || 0) + cost; } function pct(x, t) { return (x / t * 100).toFixed(1); } const coVol = tierTok.cheap_open || 0, coCost = tierCost.cheap_open || 0; const prVol = tierTok.premium || 0, prCost = tierCost.premium || 0; if (!(coVol > totTok / 2)) bad("cheap_open should dominate volume for this pattern"); if (!(prCost / totCost > prVol / totTok)) bad("premium cost-share should exceed its volume-share (the flip)"); console.log("receipts OK: cheap_open is " + pct(coVol, totTok) + "% of volume but " + pct(coCost, totCost) + "% of cost; premium is " + pct(prVol, totTok) + "% of volume but " + pct(prCost, totCost) + "% of cost (volume != value)"); ' - 2
Act on the split (the routing step, not checked by CI)
Once you can see that cheap open models carry your volume and a premium model carries your spend, route deliberately: bulk loops to the cheap open model, the fewer hard calls to premium. The actual routing and quality tradeoffs are fenced.
Eval, 2 fixtures
Last passed: verified todayclean-exitexit_codetimeout 30s · max $0Expected:
0receipts-okcontainstimeout 30s · max $0Expected:
receipts OK: cheap_open is 93.6% of volume but 34.8% of cost; premium is 6.4% of volume but 65.2% of cost (volume != value)
Results
OpenRouter's public rankings show the number one app by tokens is an agent, not a chatbot, and cheap open models (MiMo, DeepSeek, MiniMax, GLM) do most of the volume. But the number that flips the story: Anthropic is about 12% of tokens and roughly 46% of revenue. Token volume and money are different leaderboards. This recipe computes that split on your own receipts.
Did this work for you?
Our CI checks the setup runs. You tell us if the whole thing worked. Tell us straight.
Related workflows
- Wire GLM-5.2 into Hermes: valid route, 64k-context check, no key in config
- Route through a gateway with a tested open-weights fallback
- Pick a model with evidence: a GitHub Models bake-off that fits the free cap
- Run GLM-5.2 for the bulk, escalate the hard turns to Opus 4.8
- Let a free model triage your reading: one-line summary + reply flag
- Track a tool's hype curve across any Substack (no API key)
Liked this workflow?
Get new verified workflows in WebAfterAI, three issues a week (Tue, Thu, Sat).