CodingHybridFreeActiveMachine-verified· intermediate · ~15 min setup

OrcaRouter for coding: judge by passing tests, not by vibes

Fan a hard coding task out to a panel and keep the candidate whose patch actually passes your tests, using the tests_pass arbiter.

by Shilpa Mitra· verified today· v1.0.0

Run this workflow

CI-verified, 2/2 fixtures passing.

Build this with your agent

One copy-paste hands Claude Code, Codex, or Cursor the full recipe, steps included, nothing to fetch.

Intended Use

Anyone routing hard coding tasks across a panel. CI runs OrcaRouter's DSL lint: routing.yaml parses and the hard_code rule's arbiter strategy is tests_pass with a panel of at least two models. No keys, no model calls. The model runs and the test execution are fenced.

Not for

  • Thin test suites, tests_pass is only as good as your tests; it raises the floor, it does not remove review
  • Easy chat or lookups, fanning those out just multiplies cost

The Stack

Tested Against

docs.orcarouter.ai/routing/routing-dsl (2026-06)ruby@3.x (YAML stdlib)

Side effects & data flow

Network
none, local only
Writes
./routing.yaml
Credentials
none required

Prerequisites

  • An OrcaRouter account (hosted DSL, BYOK)
  • Provider API keys + a real test suite to actually run it

Steps

  1. 1

    Author the tests_pass routing rule and lint it

    Write routing.yaml with a hard_code rule that fans out to a panel and resolves it with arbiter.strategy: tests_pass, so the patch that passes your tests wins. CI parses the DSL and asserts the strategy is tests_pass with a panel of at least two models.

    cat > routing.yaml <<'YAML'
    version: 1
    rules:
      - id: hard_code
        when: task_class == "code" && difficulty > 0.6
        use:
          parallel:
            - { model: "anthropic/claude-opus-4.8" }
            - { model: "openai/gpt-4o" }
          arbiter:
            strategy: tests_pass
            model: "anthropic/claude-opus-4.8"
          max_latency_ms: 120000
    default:
      delegate: cheapest
    YAML
    ruby -ryaml -e '
    c = YAML.safe_load(File.read("routing.yaml"))
    abort "BAD: version must be 1" unless c["version"] == 1
    rule = (c["rules"] || []).find { |r| r["id"] == "hard_code" }
    abort "BAD: no hard_code rule" unless rule
    use = rule["use"] || {}
    panel = use["parallel"]
    abort "BAD: hard_code has no panel" unless panel.is_a?(Array) && panel.length >= 2
    arb = use["arbiter"] || {}
    abort "BAD: arbiter strategy is not tests_pass" unless arb["strategy"] == "tests_pass"
    puts "config OK: hard_code fans out to a " + panel.length.to_s + "-model panel judged by tests_pass (the patch that passes your tests wins)"
    '
  2. 2

    Run it on your keys (the model step, not checked by CI)

    Point your client at OrcaRouter; the hard_code rule fans out and tests_pass runs each candidate's patch against your suite, keeping the one that passes. The model runs and test execution are fenced.

Eval, 2 fixtures

Last passed: verified today
  • tests-pass-okcontainstimeout 30s · max $0

    Expected: config OK: hard_code fans out to a 2-model panel judged by tests_pass

  • clean-exitexit_codetimeout 30s · max $0

    Expected: 0

Results

The honest fix for coding fan-out: merging two plausible patches usually yields a third broken one, so make the arbiter objective. tests_pass runs the candidates and keeps the one that passes, the one place a fan-out clearly beats both a single model and a synthesizer.

Did this work for you?

Our CI checks the setup runs. You tell us if the whole thing worked. Tell us straight.

Liked this workflow?

Get new verified workflows in WebAfterAI, three issues a week (Tue, Thu, Sat).