Sakana Fugu: A/B it on your own task before you migrate
Point an OpenAI-compatible client at the Fugu endpoint and compare it to a single strong model on one hard task you already know the answer to, instead of trusting a benchmark chart.
Run this workflow
CI-verified, 2/2 fixtures passing.
Build this with your agent
One copy-paste hands Claude Code, Codex, or Cursor the full recipe, steps included, nothing to fetch.
Intended Use
Anyone evaluating Fugu against their own pipeline. CI validates the client config: base_url points at the Fugu endpoint, model is fugu or fugu-ultra, and (only for the fugu variant) any provider opt-out list is well-formed. No key, no model call. The orchestrated run and its quality are fenced.
Not for
- Quick single-shot prompts, an orchestrator helps most on long multi-step work and least on short ones where coordination overhead and latency are pure cost; test on the former
- Trusting the vendor chart, Sakana's numbers are self-reported and baselines are providers' own; judge it on a task you have a known-good answer for
- The EU/EEA, the API is not available there yet (GDPR)
The Stack
Tested Against
sakana.ai/fugu + console.sakana.ai (2026-06)node@20Side effects & data flow
- Network
- none, local only
- Writes
- ./client.json
- Credentials
- none required
Prerequisites
- A Sakana API key (only to actually run the comparison)
- An OpenAI-compatible client or coding harness
Steps
- 1
Write the client config and validate it
Point your client at https://api.sakana.ai/v1 with model fugu or fugu-ultra. For regulated data use the fugu variant and list providers to opt out of the pool (Fugu Ultra's pool is fixed, so opt-out is rejected there). CI checks the config shape; the run needs your key and is fenced.
cat > client.json <<'JSON' { "base_url": "https://api.sakana.ai/v1", "model": "fugu", "exclude_providers": ["provider-x", "provider-y"] } JSON node -e ' const fs = require("fs"); const c = JSON.parse(fs.readFileSync("client.json", "utf8")); function bad(m) { console.error("BAD: " + m); process.exit(1); } if (!c.base_url || !String(c.base_url).includes("api.sakana.ai")) bad("base_url does not point at the Fugu endpoint"); const m = c.model || ""; const isFugu = m === "fugu"; const isUltra = m.indexOf("fugu-ultra") === 0; if (!isFugu && !isUltra) bad("model must be fugu or fugu-ultra"); if (Object.prototype.hasOwnProperty.call(c, "exclude_providers")) { if (isUltra) bad("exclude_providers is only valid for the fugu variant (Fugu Ultra pool is fixed)"); if (!Array.isArray(c.exclude_providers) || !c.exclude_providers.every((x) => typeof x === "string")) bad("exclude_providers must be a list of strings"); } const optout = isFugu && Array.isArray(c.exclude_providers) ? c.exclude_providers.length : 0; console.log("config OK: client points at the Fugu endpoint, model " + m + ", opt-out list of " + optout + " provider(s)"); ' - 2
Run the A/B (the model step, not checked by CI)
Set your Sakana key, run one hard task you have a known-good answer for on Fugu Ultra and on a single strong model, and compare quality, latency, and per-call cost. Migrate only if it wins on your work. The runs are fenced.
Eval, 2 fixtures
Last passed: verified todayclient-okcontainstimeout 30s · max $0Expected:
config OK: client points at the Fugu endpoint, model fugu, opt-out list of 2 provider(s)clean-exitexit_codetimeout 30s · max $0Expected:
0
Results
Fugu is one OpenAI-compatible API that is really a trained coordinator running a pool of frontier models. Because it speaks the OpenAI protocol, you aim an existing harness at it with a base_url and key and change nothing else, then judge it on your work: quality, latency, and the per-call cost Fugu reports.
Did this work for you?
Our CI checks the setup runs. You tell us if the whole thing worked. Tell us straight.
Related workflows
- Build the Fugu pattern in the open: fan out, assign roles, verify
- Run GLM-5.2 fully local on a Mac Studio and drive it with Hermes
- Eve: make evals the deploy gate, not a vibe check
- Eve: gate the dangerous tool behind a human, in one field
- OrcaRouter: only fan out when it is worth it
- Rebuild Fable 5's deep-research fan-out on your own keys (OrcaRouter)
Liked this workflow?
Get new verified workflows in WebAfterAI, three issues a week (Tue, Thu, Sat).