AgentsHybridFreeActiveMachine-verified· intermediate · ~10 min setup

OrcaRouter: only fan out when it is worth it

Gate the expensive fan-out behind a difficulty condition so easy chat stays cheap and only hard requests pay for a panel.

by Shilpa Mitra· verified today· v1.0.0

Run this workflow

CI-verified, 2/2 fixtures passing.

Build this with your agent

One copy-paste hands Claude Code, Codex, or Cursor the full recipe, steps included, nothing to fetch.

Intended Use

Anyone pointing real traffic at a multi-model router. CI runs OrcaRouter's DSL lint: routing.yaml parses, the cheap_chat rule delegates to a cheap strategy, and the fan-out rule is gated behind a difficulty condition rather than matching everything, with a default present. No keys, no calls. The routing decisions are fenced.

Not for

Trusting difficulty as ground truth, it is a classifier's guess; watch routing in shadow mode for a week and tune the thresholds
Skipping the gate, ungated fan-out is the failure mode that produces a quiet bill

The Stack

OrcaRouterAI gateway / router

Tested Against

docs.orcarouter.ai/routing/routing-dsl (2026-06)ruby@3.x (YAML stdlib)

Side effects & data flow

Network: none, local only
Writes: ./routing.yaml
Credentials: none required

Prerequisites

An OrcaRouter account (hosted DSL, BYOK)
Provider API keys to actually run it

Steps

Author the gated routing rules and lint them

Write routing.yaml: cheap chat to the cheapest model, a fan-out gated behind difficulty > 0.6, and a repair rule that escalates after failed tests. CI parses the DSL and asserts cheap_chat delegates cheap and the fan-out rule carries a difficulty gate (not a match-all), with a default present.

cat > routing.yaml <<'YAML'
version: 1
rules:
  - id: cheap_chat
    when: task_class == "chat" && difficulty < 0.3
    use: { delegate: cheapest }
  - id: hard_only_fanout
    when: difficulty > 0.6
    use:
      parallel:
        - { model: "anthropic/claude-opus-4.8" }
        - { model: "openai/gpt-4o", samples: 2 }
      arbiter:
        strategy: best_of_n
        model: "anthropic/claude-opus-4.8"
  - id: repair_after_failed_test
    when: agent_state.last_test_failed && agent_state.consecutive_errors >= 2
    use:
      model: "anthropic/claude-opus-4.8"
      reason_tag: repair
default:
  delegate: balanced
YAML
ruby -ryaml -e '
c = YAML.safe_load(File.read("routing.yaml"))
abort "BAD: version must be 1" unless c["version"] == 1
abort "BAD: no default" unless c["default"]
rules = c["rules"] || []
chat = rules.find { |r| r["id"] == "cheap_chat" }
abort "BAD: cheap_chat does not delegate cheapest" unless chat && (chat["use"] || {})["delegate"] == "cheapest"
fan = rules.find { |r| r["id"] == "hard_only_fanout" }
abort "BAD: no hard_only_fanout rule" unless fan
gate = fan["when"].to_s
abort "BAD: fan-out is not gated behind difficulty (would match everything)" unless gate.include?("difficulty")
puts "config OK: cheap_chat -> cheapest, fan-out gated behind a difficulty condition, default present"
'

2
Watch it in shadow mode, then go live (not checked by CI)
Run OrcaRouter's shadow mode for a week to see what the rules would have done before they touch live traffic, then tune the difficulty thresholds. The routing decisions are fenced.

Eval, 2 fixtures

Last passed: verified today

gated-okcontainstimeout 30s · max $0
Expected: config OK: cheap_chat -> cheapest, fan-out gated behind a difficulty condition, default present
clean-exitexit_codetimeout 30s · max $0
Expected: 0

Results

Every parallel leg bills separately, so fanning out every request is how a clever setup becomes a surprise invoice. Send easy chat to the cheapest model, fan out only the hard ones, and escalate after a failed test.

Did this work for you?

Our CI checks the setup runs. You tell us if the whole thing worked. Tell us straight.

Liked this workflow?

Get new verified workflows in WebAfterAI, three issues a week (Tue, Thu, Sat).