Swap to a cheap image model, but guard the cases it loses
Default image generation to a cheap model for general scenes, while proving the cases the premium model dominates (text in the frame, charts, precise layout) are still routed to it, so the swap saves money without quietly degrading the work that has words in it.
Run this workflow
CI-verified, 2/2 fixtures passing.
Build this with your agent
One copy-paste hands Claude Code, Codex, or Cursor the full recipe, steps included, nothing to fetch.
Intended Use
Teams generating images at volume who want the cheap model's price without shipping garbled text on slides and infographics. CI validates the swap-guard table: every case you declare the premium model dominates is routed to it, and at least one case routes to cheap. No key, no image call. The actual generation and any quality judgement are fenced.
Not for
- Treating the gap as a flat percentage, on text-in-image and precise layout GPT-Image-2 is not a few percent better, it is a different class; route those to premium
- Assuming the cheap model is self-hostable, earlier Wan releases (2.1/2.2) shipped open weights but Wan 2.5 is API-only, so check the version before planning to run it yourself
- Stakes where one wrong frame is expensive, keep a human check on the cheap output for anything customer-facing
The Stack
Tested Against
artificialanalysis.ai image leaderboard (2026-06)node@20Side effects & data flow
- Network
- none, local only
- Writes
- ./decision.json
- Credentials
- none required
Prerequisites
- API access to a cheap image model (e.g. Wan) and a premium one (e.g. GPT-Image-2) to actually generate
Steps
- 1
Write the swap-guard table and validate it
Declare the cheap default, the premium model, the cases the premium model dominates, and a route for each case. CI checks that every dominated case routes to premium and that something still routes to cheap (otherwise there is no saving). Generating images needs your keys and is fenced.
cat > decision.json <<'JSON' { "modality": "image", "cheap": "wan-2.5", "premium": "gpt-image-2", "dominated_cases": ["text_in_frame", "charts_or_layout"], "route": [ { "when": "text_in_frame", "use": "premium" }, { "when": "charts_or_layout", "use": "premium" }, { "when": "general_scene", "use": "cheap" } ] } JSON node -e ' const fs = require("fs"); const c = JSON.parse(fs.readFileSync("decision.json", "utf8")); function bad(m) { console.error("BAD: " + m); process.exit(1); } if (!c.cheap || !c.premium) bad("need both a cheap and a premium model"); const route = c.route || []; const ruleFor = {}; for (const r of route) { if (r.when && r.use) ruleFor[r.when] = r.use; } const dom = c.dominated_cases || []; if (dom.length < 1) bad("declare the cases the premium model dominates"); for (const d of dom) { if (ruleFor[d] !== "premium") bad("dominated case " + d + " is not routed to the premium model"); } const cheapCount = route.filter(function (r) { return r.use === "cheap"; }).length; if (cheapCount < 1) bad("nothing routes to the cheap model, so there is no saving"); console.log("config OK: " + c.modality + " swap " + c.cheap + " -> " + c.premium + "; " + dom.length + " dominated case(s) all routed to premium, " + cheapCount + " case(s) to cheap"); ' - 2
Generate, routing each request by its content (the model step, not checked by CI)
At generation time, tag each prompt by whether it has text or precise layout, and send it to the model the table chose: cheap for general scenes, premium when words are in the frame. The same pattern applies to video (cheap for iterations, premium for the physics-heavy hero shot). The generation and quality are fenced.
Eval, 2 fixtures
Last passed: verified todayguard-okcontainstimeout 30s · max $0Expected:
config OK: image swap wan-2.5 -> gpt-image-2; 2 dominated case(s) all routed to premium, 1 case(s) to cheapclean-exitexit_codetimeout 30s · max $0Expected:
0
Results
For general scenes, a cheap model like Wan is roughly seven to eight times cheaper per image than top-tier GPT-Image-2, and a normal viewer barely notices the gap. But GPT-Image-2 leads the Artificial Analysis image leaderboard by the largest margin it has recorded, specifically on text inside images, dense layouts, infographics, and multilingual typography. So the honest swap is conditional: cheap for pictures, premium when there are words in the frame.
Did this work for you?
Our CI checks the setup runs. You tell us if the whole thing worked. Tell us straight.
Related workflows
Liked this workflow?
Get new verified workflows in WebAfterAI, three issues a week (Tue, Thu, Sat).