AgentsOpen SourceFreeActiveMachine-verified· intermediate · ~5 min setup

Hermes: offload background jobs to MiMo-V2-Flash and cut your main bill

Route Hermes' cheap, high-volume auxiliary work (compression, vision, web-extract) to MiMo-V2-Flash so your expensive main model only handles real reasoning.

by Shilpa Mitra· verified today· v1.0.0

Run this workflow

CI-verified, 2/2 fixtures passing.

Build this with your agent

One copy-paste hands Claude Code, Codex, or Cursor the full recipe, steps included, nothing to fetch.

Intended Use

Anyone running Hermes who wants to stop paying their main model for background traffic. CI validates that the auxiliary.compression, auxiliary.vision, and auxiliary.web_extract blocks parse and all route to openrouter + xiaomi/mimo-v2-flash. These are the real auxiliary keys in Hermes' config; the actual compression/vision/extract calls are model steps and are fenced.

Not for

  • Routing your primary reasoning here, this is for cheap high-volume background jobs only
  • Expecting CI to run the compression or vision models, those are the fenced steps

The Stack

Tested Against

hermes-agent cli-config.yaml.example (2026-06)openrouter xiaomi/mimo-v2-flashruby@3.x (YAML stdlib)

Side effects & data flow

Network
none, local only
Writes
./config.yaml
Credentials
none required

Prerequisites

  • Hermes Agent installed
  • An OpenRouter API key (only to actually run the jobs)

Steps

  1. 1

    Point the auxiliary jobs at MiMo-V2-Flash and validate

    Hermes runs several auxiliary jobs behind your conversation; by default they use your main model. Point compression (context summarization), vision (image analysis), and web_extract (page summarization) at the cheap xiaomi/mimo-v2-flash instead. CI parses the YAML and asserts all three blocks name the slug under openrouter.

    cat > config.yaml <<'YAML'
    auxiliary:
      compression:
        provider: openrouter
        model: xiaomi/mimo-v2-flash
      vision:
        provider: openrouter
        model: xiaomi/mimo-v2-flash
      web_extract:
        provider: openrouter
        model: xiaomi/mimo-v2-flash
    YAML
    ruby -ryaml -e '
    c = YAML.load_file("config.yaml")
    aux = c["auxiliary"] || {}
    %w[compression vision web_extract].each do |k|
      b = aux[k] || {}
      abort "BAD: auxiliary.#{k} provider not openrouter" unless b["provider"] == "openrouter"
      abort "BAD: auxiliary.#{k} model not xiaomi/mimo-v2-flash" unless b["model"] == "xiaomi/mimo-v2-flash"
    end
    puts "config OK: compression+vision+web_extract all route to openrouter/xiaomi/mimo-v2-flash"
    '
  2. 2

    Let the background jobs run (the model step, not checked by CI)

    With a key set, Hermes will use MiMo-V2-Flash for compression, vision, and web-extract while your main model handles reasoning. Those calls run the model, so CI never claims them.

Eval, 2 fixtures

Last passed: verified today
  • config-okcontainstimeout 30s · max $0

    Expected: config OK: compression+vision+web_extract all route to openrouter/xiaomi/mimo-v2-flash

  • clean-exitexit_codetimeout 30s · max $0

    Expected: 0

Results

MiMo-V2-Flash is the cheapest of this group at $0.10 in / $0.30 out per 1M tokens (256K context). Pointing Hermes' auxiliary jobs at it keeps the main model free for the work that actually needs it.

Did this work for you?

Our CI checks the setup runs. You tell us if the whole thing worked. Tell us straight.

Liked this workflow?

Get new verified workflows in WebAfterAI, three issues a week (Tue, Thu, Sat).