CodingHybridFreeActiveMachine-verified· beginner · ~10 min setup

Run GLM-5.2 for the bulk, escalate the hard turns to Opus 4.8

Wire a cost-routing config that sends most work to cheap hosted GLM-5.2 and only the hardest turns to Opus 4.8, instead of paying Opus prices for everything.

by Shilpa Mitra· verified today· v1.0.0

Run this workflow

CI-verified, 2/2 fixtures passing.

Build this with your agent

One copy-paste hands Claude Code, Codex, or Cursor the full recipe, steps included, nothing to fetch.

Intended Use

Anyone who wants GLM-5.2's price on the bulk without losing Opus on the hard turns. CI validates the routing config: a GLM-5.2 default on an OpenAI-compatible host (OpenRouter or Z.ai), an Opus 4.8 escalation, and a gate so escalation is conditional, not match-all. No keys, no model call. The per-task cost and quality are fenced.

Not for

  • Taking leaderboard wins as a promise about your task, the numbers are provisional third-party and vendor figures; on short, sharp, general coding the benchmarks still favor Opus
  • Assuming cheaper per token means cheaper per task, if the cheap model needs extra reasoning loops the advantage shrinks; measure cost per finished task on your own work
  • Data-residency constraints, GLM-5.2 is a Chinese-lab model served by various hosts; the MIT weights are unrestricted but check where your prompts actually land
  • Quantized cheap hosts, the very cheapest routes often serve a quantized variant, very-good-not-identical to full precision

The Stack

Tested Against

openrouter.ai/z-ai/glm-5.2 (2026-06)node@20

Side effects & data flow

Network
none, local only
Writes
./routing.json
Credentials
none required

Prerequisites

  • An OpenRouter or Z.ai key (only to actually run it)
  • A router or client that supports a default + escalation model

Steps

  1. 1

    Write the cost-routing config and validate it

    Set GLM-5.2 as the default on an OpenAI-compatible host (OpenRouter id z-ai/glm-5.2, or Z.ai direct) and escalate to Opus 4.8 only on hard turns. CI checks the config shape; pointing it at real traffic needs a key and is fenced.

    cat > routing.json <<'JSON'
    {
      "base_url": "https://openrouter.ai/api/v1",
      "default_model": "z-ai/glm-5.2",
      "escalate_model": "anthropic/claude-opus-4.8",
      "escalate_when": "task_difficulty == hard || category == general_coding"
    }
    JSON
    node -e '
    const fs = require("fs");
    const c = JSON.parse(fs.readFileSync("routing.json", "utf8"));
    function bad(m) { console.error("BAD: " + m); process.exit(1); }
    const host = String(c.base_url || "");
    if (!host.includes("openrouter.ai") && !host.includes("z.ai")) bad("base_url is not an OpenAI-compatible GLM host (openrouter.ai or z.ai)");
    const def = c.default_model || "";
    if (!def.includes("glm-5.2")) bad("default_model is not GLM-5.2");
    const esc = c.escalate_model || "";
    if (!esc.includes("opus-4.8")) bad("escalate_model is not Opus 4.8");
    const when = String(c.escalate_when || "");
    if (when.length < 3) bad("escalate_when must gate escalation, not match everything");
    const where = host.includes("openrouter.ai") ? "openrouter.ai" : "z.ai";
    console.log("config OK: bulk -> " + def + " via " + where + ", escalate to " + esc + " on hard turns");
    '
  2. 2

    Point it at your work and measure cost per task (the model step, not checked by CI)

    Add your key and run real prompts. Watch cost per finished task, not per token: if GLM-5.2 lands your bulk work in the same number of turns, the savings are real; if it needs extra loops, the gap shrinks. Keep Opus for the short, hard, general-coding turns. The runs are fenced.

Eval, 2 fixtures

Last passed: verified today
  • routing-okcontainstimeout 30s · max $0

    Expected: config OK: bulk -> z-ai/glm-5.2 via openrouter.ai, escalate to anthropic/claude-opus-4.8 on hard turns

  • clean-exitexit_codetimeout 30s · max $0

    Expected: 0

Results

GLM-5.2 (Z.ai, MIT, 1M context) is roughly five to six times cheaper than Opus 4.8 on output ($3 vs $25 per 1M via OpenRouter), and on a provisional third-party leaderboard it ties Opus on long-horizon coding and edges it on ultra-long and agentic work, while Opus still wins general coding. So route the bulk to GLM-5.2 and keep Opus for the short, hard, general-coding turns.

Did this work for you?

Our CI checks the setup runs. You tell us if the whole thing worked. Tell us straight.

Related workflows

Liked this workflow?

Get new verified workflows in WebAfterAI, three issues a week (Tue, Thu, Sat).