OrcaRouter: only fan out when it is worth it
Gate the expensive fan-out behind a difficulty condition so easy chat stays cheap and only hard requests pay for a panel.
Run this workflow
CI-verified, 2/2 fixtures passing.
Build this with your agent
One copy-paste hands Claude Code, Codex, or Cursor the full recipe, steps included, nothing to fetch.
Intended Use
Anyone pointing real traffic at a multi-model router. CI runs OrcaRouter's DSL lint: routing.yaml parses, the cheap_chat rule delegates to a cheap strategy, and the fan-out rule is gated behind a difficulty condition rather than matching everything, with a default present. No keys, no calls. The routing decisions are fenced.
Not for
- Trusting difficulty as ground truth, it is a classifier's guess; watch routing in shadow mode for a week and tune the thresholds
- Skipping the gate, ungated fan-out is the failure mode that produces a quiet bill
The Stack
Tested Against
docs.orcarouter.ai/routing/routing-dsl (2026-06)ruby@3.x (YAML stdlib)Side effects & data flow
- Network
- none, local only
- Writes
- ./routing.yaml
- Credentials
- none required
Prerequisites
- An OrcaRouter account (hosted DSL, BYOK)
- Provider API keys to actually run it
Steps
- 1
Author the gated routing rules and lint them
Write routing.yaml: cheap chat to the cheapest model, a fan-out gated behind difficulty > 0.6, and a repair rule that escalates after failed tests. CI parses the DSL and asserts cheap_chat delegates cheap and the fan-out rule carries a difficulty gate (not a match-all), with a default present.
cat > routing.yaml <<'YAML' version: 1 rules: - id: cheap_chat when: task_class == "chat" && difficulty < 0.3 use: { delegate: cheapest } - id: hard_only_fanout when: difficulty > 0.6 use: parallel: - { model: "anthropic/claude-opus-4.8" } - { model: "openai/gpt-4o", samples: 2 } arbiter: strategy: best_of_n model: "anthropic/claude-opus-4.8" - id: repair_after_failed_test when: agent_state.last_test_failed && agent_state.consecutive_errors >= 2 use: model: "anthropic/claude-opus-4.8" reason_tag: repair default: delegate: balanced YAML ruby -ryaml -e ' c = YAML.safe_load(File.read("routing.yaml")) abort "BAD: version must be 1" unless c["version"] == 1 abort "BAD: no default" unless c["default"] rules = c["rules"] || [] chat = rules.find { |r| r["id"] == "cheap_chat" } abort "BAD: cheap_chat does not delegate cheapest" unless chat && (chat["use"] || {})["delegate"] == "cheapest" fan = rules.find { |r| r["id"] == "hard_only_fanout" } abort "BAD: no hard_only_fanout rule" unless fan gate = fan["when"].to_s abort "BAD: fan-out is not gated behind difficulty (would match everything)" unless gate.include?("difficulty") puts "config OK: cheap_chat -> cheapest, fan-out gated behind a difficulty condition, default present" ' - 2
Watch it in shadow mode, then go live (not checked by CI)
Run OrcaRouter's shadow mode for a week to see what the rules would have done before they touch live traffic, then tune the difficulty thresholds. The routing decisions are fenced.
Eval, 2 fixtures
Last passed: verified todaygated-okcontainstimeout 30s · max $0Expected:
config OK: cheap_chat -> cheapest, fan-out gated behind a difficulty condition, default presentclean-exitexit_codetimeout 30s · max $0Expected:
0
Results
Every parallel leg bills separately, so fanning out every request is how a clever setup becomes a surprise invoice. Send easy chat to the cheapest model, fan out only the hard ones, and escalate after a failed test.
Did this work for you?
Our CI checks the setup runs. You tell us if the whole thing worked. Tell us straight.
Liked this workflow?
Get new verified workflows in WebAfterAI, three issues a week (Tue, Thu, Sat).