AgentsOpen SourceFreeActiveMachine-verified· beginner · ~5 min setup

Hermes + DeepSeek V4 Flash: a one-line reasoning-effort throttle

Run one model from cheap-and-fast to deep-and-careful with a single reasoning_effort setting, so you don't pay for deep thinking on easy turns.

by Shilpa Mitra· verified today· v1.0.0

Run this workflow

CI-verified, 2/2 fixtures passing.

Build this with your agent

One copy-paste hands Claude Code, Codex, or Cursor the full recipe, steps included, nothing to fetch.

Intended Use

Anyone who wants to dial reasoning effort per task on a cheap DeepSeek V4 Flash setup. CI validates that config.yaml parses, names the deepseek/deepseek-v4-flash slug, and that agent.reasoning_effort is one of Hermes' allowed levels (none, minimal, low, medium, high, xhigh). No key. The model's actual reasoning is fenced.

Not for

  • Treating the leaderboard's 'Max' and 'High' as separate models, they are xhigh and high effort on the same model
  • Expecting CI to measure token spend, that is the fenced model step

The Stack

Tested Against

hermes-agent docs (2026-06)openrouter deepseek/deepseek-v4-flashruby@3.x (YAML stdlib)

Side effects & data flow

Network
none, local only
Writes
./config.yaml
Credentials
none required

Prerequisites

  • Hermes Agent installed
  • An OpenRouter API key (only to actually run the agent)

Steps

  1. 1

    Set the model and a default reasoning effort, then validate

    Write config.yaml with the deepseek/deepseek-v4-flash slug and agent.reasoning_effort: high (Hermes' default is medium). At runtime you switch per task with /reasoning xhigh or /reasoning none, no restart. CI parses the YAML and asserts reasoning_effort is a valid level.

    cat > config.yaml <<'YAML'
    model:
      provider: openrouter
      model: deepseek/deepseek-v4-flash
    agent:
      reasoning_effort: high
    YAML
    ruby -ryaml -e '
    allowed = ["none","minimal","low","medium","high","xhigh"]
    c = YAML.load_file("config.yaml")
    m = c["model"] || {}; a = c["agent"] || {}
    abort "BAD: model not deepseek/deepseek-v4-flash" unless m["model"] == "deepseek/deepseek-v4-flash"
    re = a["reasoning_effort"].to_s
    abort "BAD: reasoning_effort not a valid level: #{re}" unless allowed.include?(re)
    puts "config OK: deepseek/deepseek-v4-flash + reasoning_effort=" + re + " is a valid level"
    '
  2. 2

    Throttle per task at runtime (the model step, not checked by CI)

    /reasoning xhigh for the hard one, /reasoning none for a quick lookup. xhigh can multiply output tokens, so use it deliberately; keep stable prefixes consistent to hit DeepSeek's cached-input discount. The reasoning itself is fenced.

Eval, 2 fixtures

Last passed: verified today
  • config-okcontainstimeout 30s · max $0

    Expected: config OK: deepseek/deepseek-v4-flash + reasoning_effort=high is a valid level

  • clean-exitexit_codetimeout 30s · max $0

    Expected: 0

Results

DeepSeek V4 Flash is $0.098 in / $0.196 out per 1M tokens on OpenRouter (1M context). Reasoning effort is mostly output tokens, so the throttle is your biggest lever on the bill: run 'high' by default, push to 'xhigh' only when earned.

Did this work for you?

Our CI checks the setup runs. You tell us if the whole thing worked. Tell us straight.

Liked this workflow?

Get new verified workflows in WebAfterAI, three issues a week (Tue, Thu, Sat).