AgentsOpen SourceFreeActiveMachine-verified· intermediate · ~10 min setup

Hermes MoA: stack frontier models into one virtual model for hard turns

Configure a Mixture-of-Agents preset in Hermes so several models answer in parallel and an aggregator writes the final response, and validate the preset before you spend double the tokens on it.

by Shilpa Mitra· verified today· v1.0.0

Run this workflow

CI-verified, 2/2 fixtures passing.

Build this with your agent

One copy-paste hands Claude Code, Codex, or Cursor the full recipe, steps included, nothing to fetch.

Intended Use

Anyone who wants a multi-model mixture for genuinely hard turns. CI validates the moa config: the block parses, default_preset names a real preset, each reference and the aggregator carry a provider/model, and the aggregator is not itself an moa preset (recursion blocked). No keys, no model call. The benchmark score and any quality gain are fenced.

Not for

Routine work, MoA is at least three model calls per iteration (references plus aggregator), so budget roughly double the tokens and added latency on every turn; keep it for hard turns and set enabled: false otherwise
Taking 0.8202 as settled, HermesBench is the vendor's own unreleased single-harness eval; treat it as curiosity, not a leaderboard claim
Correctness that must hold, a panel of similar models can share a blind spot and amplify it with confidence; for that you want an external verifier, not a vote
Bypassing access, MoA orchestrates models you already pay for (the default calls GPT-5.5 and Opus 4.8 through their own providers), it does not unlock a gated capability

The Stack

Hermes Agentagent runtime (MoA provider)

Tested Against

hermes-agent.nousresearch.com/docs (Mixture of Agents, 2026-06)ruby@3.x (YAML stdlib)

Side effects & data flow

Network: none, local only
Writes: ./config.yaml
Credentials: none required

Prerequisites

Hermes Agent installed
Provider access/keys for the reference and aggregator models (only to actually run MoA)

Steps

Scaffold the MoA preset and validate it

Write the moa block in config.yaml (the default preset is a fine start: a GPT-5.5 + DeepSeek reference pair with an Opus-4.8 aggregator). CI parses it and checks the preset shape and that the aggregator is a real model, not another moa preset. Running the mixture needs provider keys and is fenced.

cat > config.yaml <<'YAML'
moa:
  default_preset: default
  presets:
    default:
      reference_models:
        - provider: openai-codex
          model: gpt-5.5
        - provider: openrouter
          model: deepseek/deepseek-v4-pro
      aggregator:
        provider: openrouter
        model: anthropic/claude-opus-4.8
      reference_temperature: 0.6
      aggregator_temperature: 0.4
      max_tokens: 4096
      enabled: true
YAML
ruby -ryaml -e '
c = YAML.load_file("config.yaml")
moa = c["moa"] || {}
abort "BAD: no moa block" if moa.empty?
dp = moa["default_preset"]
presets = moa["presets"] || {}
preset = presets[dp]
abort "BAD: default_preset names no existing preset" unless preset
refs = preset["reference_models"]
abort "BAD: need at least one reference model" unless refs.is_a?(Array) && refs.length >= 1
refs.each { |r| abort "BAD: a reference is missing provider or model" unless r["provider"] && r["model"] }
agg = preset["aggregator"] || {}
abort "BAD: aggregator missing provider or model" unless agg["provider"] && agg["model"]
abort "BAD: aggregator is itself an moa preset (recursion blocked)" if agg["provider"] == "moa"
puts "config OK: MoA preset " + dp + " fans out to " + refs.length.to_s + " reference model(s); acting model = " + agg["provider"] + "/" + agg["model"]
'

2
Point it only at your hardest turns (the model step, not checked by CI)
Select the preset with /model default --provider moa, or run one turn with /moa <prompt> (it runs the mixture for that turn, then restores your normal model). Watch your token bill: if the lift is worth roughly double the cost on your tasks, keep it for hard work. The mixture run and its quality are fenced.

Eval, 2 fixtures

Last passed: verified today

moa-okcontainstimeout 30s · max $0
Expected: config OK: MoA preset default fans out to 2 reference model(s); acting model = openrouter/anthropic/claude-opus-4.8
clean-exitexit_codetimeout 30s · max $0
Expected: 0

Results

Mixture of Agents (Together AI, 2024; ICLR 2025) runs a prompt through several models, then one aggregator fuses their answers. In Hermes it shows up as a virtual model under the moa provider: the aggregator is the acting model (it writes the response and emits tool calls), references run first and are appended as private context. The vendor's own HermesBench put the default mix above Opus 4.8 and GPT-5.5, but that eval is unreleased.

Did this work for you?

Our CI checks the setup runs. You tell us if the whole thing worked. Tell us straight.

Related workflows

Liked this workflow?

Get new verified workflows in WebAfterAI, three issues a week (Tue, Thu, Sat).