AgentsOpen SourceFreeActiveMachine-verified· intermediate · ~10 min setup

Hermes MoA: stack frontier models into one virtual model for hard turns

Configure a Mixture-of-Agents preset in Hermes so several models answer in parallel and an aggregator writes the final response, and validate the preset before you spend double the tokens on it.

by Shilpa Mitra· verified today· v1.0.0

Run this workflow

CI-verified, 2/2 fixtures passing.

Build this with your agent

One copy-paste hands Claude Code, Codex, or Cursor the full recipe, steps included, nothing to fetch.

Intended Use

Anyone who wants a multi-model mixture for genuinely hard turns. CI validates the moa config: the block parses, default_preset names a real preset, each reference and the aggregator carry a provider/model, and the aggregator is not itself an moa preset (recursion blocked). No keys, no model call. The benchmark score and any quality gain are fenced.

Not for

  • Routine work, MoA is at least three model calls per iteration (references plus aggregator), so budget roughly double the tokens and added latency on every turn; keep it for hard turns and set enabled: false otherwise
  • Taking 0.8202 as settled, HermesBench is the vendor's own unreleased single-harness eval; treat it as curiosity, not a leaderboard claim
  • Correctness that must hold, a panel of similar models can share a blind spot and amplify it with confidence; for that you want an external verifier, not a vote
  • Bypassing access, MoA orchestrates models you already pay for (the default calls GPT-5.5 and Opus 4.8 through their own providers), it does not unlock a gated capability

The Stack

Tested Against

hermes-agent.nousresearch.com/docs (Mixture of Agents, 2026-06)ruby@3.x (YAML stdlib)

Side effects & data flow

Network
none, local only
Writes
./config.yaml
Credentials
none required

Prerequisites

  • Hermes Agent installed
  • Provider access/keys for the reference and aggregator models (only to actually run MoA)

Steps

  1. 1

    Scaffold the MoA preset and validate it

    Write the moa block in config.yaml (the default preset is a fine start: a GPT-5.5 + DeepSeek reference pair with an Opus-4.8 aggregator). CI parses it and checks the preset shape and that the aggregator is a real model, not another moa preset. Running the mixture needs provider keys and is fenced.

    cat > config.yaml <<'YAML'
    moa:
      default_preset: default
      presets:
        default:
          reference_models:
            - provider: openai-codex
              model: gpt-5.5
            - provider: openrouter
              model: deepseek/deepseek-v4-pro
          aggregator:
            provider: openrouter
            model: anthropic/claude-opus-4.8
          reference_temperature: 0.6
          aggregator_temperature: 0.4
          max_tokens: 4096
          enabled: true
    YAML
    ruby -ryaml -e '
    c = YAML.load_file("config.yaml")
    moa = c["moa"] || {}
    abort "BAD: no moa block" if moa.empty?
    dp = moa["default_preset"]
    presets = moa["presets"] || {}
    preset = presets[dp]
    abort "BAD: default_preset names no existing preset" unless preset
    refs = preset["reference_models"]
    abort "BAD: need at least one reference model" unless refs.is_a?(Array) && refs.length >= 1
    refs.each { |r| abort "BAD: a reference is missing provider or model" unless r["provider"] && r["model"] }
    agg = preset["aggregator"] || {}
    abort "BAD: aggregator missing provider or model" unless agg["provider"] && agg["model"]
    abort "BAD: aggregator is itself an moa preset (recursion blocked)" if agg["provider"] == "moa"
    puts "config OK: MoA preset " + dp + " fans out to " + refs.length.to_s + " reference model(s); acting model = " + agg["provider"] + "/" + agg["model"]
    '
  2. 2

    Point it only at your hardest turns (the model step, not checked by CI)

    Select the preset with /model default --provider moa, or run one turn with /moa <prompt> (it runs the mixture for that turn, then restores your normal model). Watch your token bill: if the lift is worth roughly double the cost on your tasks, keep it for hard work. The mixture run and its quality are fenced.

Eval, 2 fixtures

Last passed: verified today
  • moa-okcontainstimeout 30s · max $0

    Expected: config OK: MoA preset default fans out to 2 reference model(s); acting model = openrouter/anthropic/claude-opus-4.8

  • clean-exitexit_codetimeout 30s · max $0

    Expected: 0

Results

Mixture of Agents (Together AI, 2024; ICLR 2025) runs a prompt through several models, then one aggregator fuses their answers. In Hermes it shows up as a virtual model under the moa provider: the aggregator is the acting model (it writes the response and emits tool calls), references run first and are appended as private context. The vendor's own HermesBench put the default mix above Opus 4.8 and GPT-5.5, but that eval is unreleased.

Did this work for you?

Our CI checks the setup runs. You tell us if the whole thing worked. Tell us straight.

Related workflows

Liked this workflow?

Get new verified workflows in WebAfterAI, three issues a week (Tue, Thu, Sat).