Hermes MoA: stack frontier models into one virtual model for hard turns
Configure a Mixture-of-Agents preset in Hermes so several models answer in parallel and an aggregator writes the final response, and validate the preset before you spend double the tokens on it.
Run this workflow
CI-verified, 2/2 fixtures passing.
Build this with your agent
One copy-paste hands Claude Code, Codex, or Cursor the full recipe, steps included, nothing to fetch.
Intended Use
Anyone who wants a multi-model mixture for genuinely hard turns. CI validates the moa config: the block parses, default_preset names a real preset, each reference and the aggregator carry a provider/model, and the aggregator is not itself an moa preset (recursion blocked). No keys, no model call. The benchmark score and any quality gain are fenced.
Not for
- Routine work, MoA is at least three model calls per iteration (references plus aggregator), so budget roughly double the tokens and added latency on every turn; keep it for hard turns and set enabled: false otherwise
- Taking 0.8202 as settled, HermesBench is the vendor's own unreleased single-harness eval; treat it as curiosity, not a leaderboard claim
- Correctness that must hold, a panel of similar models can share a blind spot and amplify it with confidence; for that you want an external verifier, not a vote
- Bypassing access, MoA orchestrates models you already pay for (the default calls GPT-5.5 and Opus 4.8 through their own providers), it does not unlock a gated capability
The Stack
Tested Against
hermes-agent.nousresearch.com/docs (Mixture of Agents, 2026-06)ruby@3.x (YAML stdlib)Side effects & data flow
- Network
- none, local only
- Writes
- ./config.yaml
- Credentials
- none required
Prerequisites
- Hermes Agent installed
- Provider access/keys for the reference and aggregator models (only to actually run MoA)
Steps
- 1
Scaffold the MoA preset and validate it
Write the moa block in config.yaml (the default preset is a fine start: a GPT-5.5 + DeepSeek reference pair with an Opus-4.8 aggregator). CI parses it and checks the preset shape and that the aggregator is a real model, not another moa preset. Running the mixture needs provider keys and is fenced.
cat > config.yaml <<'YAML' moa: default_preset: default presets: default: reference_models: - provider: openai-codex model: gpt-5.5 - provider: openrouter model: deepseek/deepseek-v4-pro aggregator: provider: openrouter model: anthropic/claude-opus-4.8 reference_temperature: 0.6 aggregator_temperature: 0.4 max_tokens: 4096 enabled: true YAML ruby -ryaml -e ' c = YAML.load_file("config.yaml") moa = c["moa"] || {} abort "BAD: no moa block" if moa.empty? dp = moa["default_preset"] presets = moa["presets"] || {} preset = presets[dp] abort "BAD: default_preset names no existing preset" unless preset refs = preset["reference_models"] abort "BAD: need at least one reference model" unless refs.is_a?(Array) && refs.length >= 1 refs.each { |r| abort "BAD: a reference is missing provider or model" unless r["provider"] && r["model"] } agg = preset["aggregator"] || {} abort "BAD: aggregator missing provider or model" unless agg["provider"] && agg["model"] abort "BAD: aggregator is itself an moa preset (recursion blocked)" if agg["provider"] == "moa" puts "config OK: MoA preset " + dp + " fans out to " + refs.length.to_s + " reference model(s); acting model = " + agg["provider"] + "/" + agg["model"] ' - 2
Point it only at your hardest turns (the model step, not checked by CI)
Select the preset with /model default --provider moa, or run one turn with /moa <prompt> (it runs the mixture for that turn, then restores your normal model). Watch your token bill: if the lift is worth roughly double the cost on your tasks, keep it for hard work. The mixture run and its quality are fenced.
Eval, 2 fixtures
Last passed: verified todaymoa-okcontainstimeout 30s · max $0Expected:
config OK: MoA preset default fans out to 2 reference model(s); acting model = openrouter/anthropic/claude-opus-4.8clean-exitexit_codetimeout 30s · max $0Expected:
0
Results
Mixture of Agents (Together AI, 2024; ICLR 2025) runs a prompt through several models, then one aggregator fuses their answers. In Hermes it shows up as a virtual model under the moa provider: the aggregator is the acting model (it writes the response and emits tool calls), references run first and are appended as private context. The vendor's own HermesBench put the default mix above Opus 4.8 and GPT-5.5, but that eval is unreleased.
Did this work for you?
Our CI checks the setup runs. You tell us if the whole thing worked. Tell us straight.
Related workflows
- promptfoo: make agent evals fail the build, not the user
- E2B: run model-written code in a sandbox, not on your box
- DSPy: program the pipeline, compile the prompts (stop hand-tuning)
- Write an agent loop in code with smolagents (sandboxed)
- Hermes /learn: author a reusable skill from a source, not by hand
- Text your own AI assistant on WhatsApp: Hermes wired to FreeLLMAPI
Liked this workflow?
Get new verified workflows in WebAfterAI, three issues a week (Tue, Thu, Sat).