Hermes: offload background jobs to MiMo-V2-Flash and cut your main bill
Route Hermes' cheap, high-volume auxiliary work (compression, vision, web-extract) to MiMo-V2-Flash so your expensive main model only handles real reasoning.
Run this workflow
CI-verified, 2/2 fixtures passing.
Build this with your agent
One copy-paste hands Claude Code, Codex, or Cursor the full recipe, steps included, nothing to fetch.
Intended Use
Anyone running Hermes who wants to stop paying their main model for background traffic. CI validates that the auxiliary.compression, auxiliary.vision, and auxiliary.web_extract blocks parse and all route to openrouter + xiaomi/mimo-v2-flash. These are the real auxiliary keys in Hermes' config; the actual compression/vision/extract calls are model steps and are fenced.
Not for
- Routing your primary reasoning here, this is for cheap high-volume background jobs only
- Expecting CI to run the compression or vision models, those are the fenced steps
The Stack
Tested Against
hermes-agent cli-config.yaml.example (2026-06)openrouter xiaomi/mimo-v2-flashruby@3.x (YAML stdlib)Side effects & data flow
- Network
- none, local only
- Writes
- ./config.yaml
- Credentials
- none required
Prerequisites
- Hermes Agent installed
- An OpenRouter API key (only to actually run the jobs)
Steps
- 1
Point the auxiliary jobs at MiMo-V2-Flash and validate
Hermes runs several auxiliary jobs behind your conversation; by default they use your main model. Point compression (context summarization), vision (image analysis), and web_extract (page summarization) at the cheap xiaomi/mimo-v2-flash instead. CI parses the YAML and asserts all three blocks name the slug under openrouter.
cat > config.yaml <<'YAML' auxiliary: compression: provider: openrouter model: xiaomi/mimo-v2-flash vision: provider: openrouter model: xiaomi/mimo-v2-flash web_extract: provider: openrouter model: xiaomi/mimo-v2-flash YAML ruby -ryaml -e ' c = YAML.load_file("config.yaml") aux = c["auxiliary"] || {} %w[compression vision web_extract].each do |k| b = aux[k] || {} abort "BAD: auxiliary.#{k} provider not openrouter" unless b["provider"] == "openrouter" abort "BAD: auxiliary.#{k} model not xiaomi/mimo-v2-flash" unless b["model"] == "xiaomi/mimo-v2-flash" end puts "config OK: compression+vision+web_extract all route to openrouter/xiaomi/mimo-v2-flash" ' - 2
Let the background jobs run (the model step, not checked by CI)
With a key set, Hermes will use MiMo-V2-Flash for compression, vision, and web-extract while your main model handles reasoning. Those calls run the model, so CI never claims them.
Eval, 2 fixtures
Last passed: verified todayconfig-okcontainstimeout 30s · max $0Expected:
config OK: compression+vision+web_extract all route to openrouter/xiaomi/mimo-v2-flashclean-exitexit_codetimeout 30s · max $0Expected:
0
Results
MiMo-V2-Flash is the cheapest of this group at $0.10 in / $0.30 out per 1M tokens (256K context). Pointing Hermes' auxiliary jobs at it keeps the main model free for the work that actually needs it.
Did this work for you?
Our CI checks the setup runs. You tell us if the whole thing worked. Tell us straight.
Liked this workflow?
Get new verified workflows in WebAfterAI, three issues a week (Tue, Thu, Sat).