OrcaRouter for coding: judge by passing tests, not by vibes
Fan a hard coding task out to a panel and keep the candidate whose patch actually passes your tests, using the tests_pass arbiter.
Run this workflow
CI-verified, 2/2 fixtures passing.
Build this with your agent
One copy-paste hands Claude Code, Codex, or Cursor the full recipe, steps included, nothing to fetch.
Intended Use
Anyone routing hard coding tasks across a panel. CI runs OrcaRouter's DSL lint: routing.yaml parses and the hard_code rule's arbiter strategy is tests_pass with a panel of at least two models. No keys, no model calls. The model runs and the test execution are fenced.
Not for
- Thin test suites, tests_pass is only as good as your tests; it raises the floor, it does not remove review
- Easy chat or lookups, fanning those out just multiplies cost
The Stack
Tested Against
docs.orcarouter.ai/routing/routing-dsl (2026-06)ruby@3.x (YAML stdlib)Side effects & data flow
- Network
- none, local only
- Writes
- ./routing.yaml
- Credentials
- none required
Prerequisites
- An OrcaRouter account (hosted DSL, BYOK)
- Provider API keys + a real test suite to actually run it
Steps
- 1
Author the tests_pass routing rule and lint it
Write routing.yaml with a hard_code rule that fans out to a panel and resolves it with arbiter.strategy: tests_pass, so the patch that passes your tests wins. CI parses the DSL and asserts the strategy is tests_pass with a panel of at least two models.
cat > routing.yaml <<'YAML' version: 1 rules: - id: hard_code when: task_class == "code" && difficulty > 0.6 use: parallel: - { model: "anthropic/claude-opus-4.8" } - { model: "openai/gpt-4o" } arbiter: strategy: tests_pass model: "anthropic/claude-opus-4.8" max_latency_ms: 120000 default: delegate: cheapest YAML ruby -ryaml -e ' c = YAML.safe_load(File.read("routing.yaml")) abort "BAD: version must be 1" unless c["version"] == 1 rule = (c["rules"] || []).find { |r| r["id"] == "hard_code" } abort "BAD: no hard_code rule" unless rule use = rule["use"] || {} panel = use["parallel"] abort "BAD: hard_code has no panel" unless panel.is_a?(Array) && panel.length >= 2 arb = use["arbiter"] || {} abort "BAD: arbiter strategy is not tests_pass" unless arb["strategy"] == "tests_pass" puts "config OK: hard_code fans out to a " + panel.length.to_s + "-model panel judged by tests_pass (the patch that passes your tests wins)" ' - 2
Run it on your keys (the model step, not checked by CI)
Point your client at OrcaRouter; the hard_code rule fans out and tests_pass runs each candidate's patch against your suite, keeping the one that passes. The model runs and test execution are fenced.
Eval, 2 fixtures
Last passed: verified todaytests-pass-okcontainstimeout 30s · max $0Expected:
config OK: hard_code fans out to a 2-model panel judged by tests_passclean-exitexit_codetimeout 30s · max $0Expected:
0
Results
The honest fix for coding fan-out: merging two plausible patches usually yields a third broken one, so make the arbiter objective. tests_pass runs the candidates and keeps the one that passes, the one place a fan-out clearly beats both a single model and a synthesizer.
Did this work for you?
Our CI checks the setup runs. You tell us if the whole thing worked. Tell us straight.
Liked this workflow?
Get new verified workflows in WebAfterAI, three issues a week (Tue, Thu, Sat).