AgentsCommercialFreeActiveMachine-verified· beginner · ~10 min setup

Sakana Fugu: A/B it on your own task before you migrate

Point an OpenAI-compatible client at the Fugu endpoint and compare it to a single strong model on one hard task you already know the answer to, instead of trusting a benchmark chart.

by Shilpa Mitra· verified today· v1.0.0

Run this workflow

CI-verified, 2/2 fixtures passing.

Build this with your agent

One copy-paste hands Claude Code, Codex, or Cursor the full recipe, steps included, nothing to fetch.

Intended Use

Anyone evaluating Fugu against their own pipeline. CI validates the client config: base_url points at the Fugu endpoint, model is fugu or fugu-ultra, and (only for the fugu variant) any provider opt-out list is well-formed. No key, no model call. The orchestrated run and its quality are fenced.

Not for

  • Quick single-shot prompts, an orchestrator helps most on long multi-step work and least on short ones where coordination overhead and latency are pure cost; test on the former
  • Trusting the vendor chart, Sakana's numbers are self-reported and baselines are providers' own; judge it on a task you have a known-good answer for
  • The EU/EEA, the API is not available there yet (GDPR)

The Stack

Tested Against

sakana.ai/fugu + console.sakana.ai (2026-06)node@20

Side effects & data flow

Network
none, local only
Writes
./client.json
Credentials
none required

Prerequisites

  • A Sakana API key (only to actually run the comparison)
  • An OpenAI-compatible client or coding harness

Steps

  1. 1

    Write the client config and validate it

    Point your client at https://api.sakana.ai/v1 with model fugu or fugu-ultra. For regulated data use the fugu variant and list providers to opt out of the pool (Fugu Ultra's pool is fixed, so opt-out is rejected there). CI checks the config shape; the run needs your key and is fenced.

    cat > client.json <<'JSON'
    {
      "base_url": "https://api.sakana.ai/v1",
      "model": "fugu",
      "exclude_providers": ["provider-x", "provider-y"]
    }
    JSON
    node -e '
    const fs = require("fs");
    const c = JSON.parse(fs.readFileSync("client.json", "utf8"));
    function bad(m) { console.error("BAD: " + m); process.exit(1); }
    if (!c.base_url || !String(c.base_url).includes("api.sakana.ai")) bad("base_url does not point at the Fugu endpoint");
    const m = c.model || "";
    const isFugu = m === "fugu";
    const isUltra = m.indexOf("fugu-ultra") === 0;
    if (!isFugu && !isUltra) bad("model must be fugu or fugu-ultra");
    if (Object.prototype.hasOwnProperty.call(c, "exclude_providers")) {
      if (isUltra) bad("exclude_providers is only valid for the fugu variant (Fugu Ultra pool is fixed)");
      if (!Array.isArray(c.exclude_providers) || !c.exclude_providers.every((x) => typeof x === "string")) bad("exclude_providers must be a list of strings");
    }
    const optout = isFugu && Array.isArray(c.exclude_providers) ? c.exclude_providers.length : 0;
    console.log("config OK: client points at the Fugu endpoint, model " + m + ", opt-out list of " + optout + " provider(s)");
    '
  2. 2

    Run the A/B (the model step, not checked by CI)

    Set your Sakana key, run one hard task you have a known-good answer for on Fugu Ultra and on a single strong model, and compare quality, latency, and per-call cost. Migrate only if it wins on your work. The runs are fenced.

Eval, 2 fixtures

Last passed: verified today
  • client-okcontainstimeout 30s · max $0

    Expected: config OK: client points at the Fugu endpoint, model fugu, opt-out list of 2 provider(s)

  • clean-exitexit_codetimeout 30s · max $0

    Expected: 0

Results

Fugu is one OpenAI-compatible API that is really a trained coordinator running a pool of frontier models. Because it speaks the OpenAI protocol, you aim an existing harness at it with a base_url and key and change nothing else, then judge it on your work: quality, latency, and the per-call cost Fugu reports.

Did this work for you?

Our CI checks the setup runs. You tell us if the whole thing worked. Tell us straight.

Related workflows

Liked this workflow?

Get new verified workflows in WebAfterAI, three issues a week (Tue, Thu, Sat).