Hermes + DeepSeek V4 Flash: a one-line reasoning-effort throttle
Run one model from cheap-and-fast to deep-and-careful with a single reasoning_effort setting, so you don't pay for deep thinking on easy turns.
Run this workflow
CI-verified, 2/2 fixtures passing.
Build this with your agent
One copy-paste hands Claude Code, Codex, or Cursor the full recipe, steps included, nothing to fetch.
Intended Use
Anyone who wants to dial reasoning effort per task on a cheap DeepSeek V4 Flash setup. CI validates that config.yaml parses, names the deepseek/deepseek-v4-flash slug, and that agent.reasoning_effort is one of Hermes' allowed levels (none, minimal, low, medium, high, xhigh). No key. The model's actual reasoning is fenced.
Not for
- Treating the leaderboard's 'Max' and 'High' as separate models, they are xhigh and high effort on the same model
- Expecting CI to measure token spend, that is the fenced model step
The Stack
Tested Against
hermes-agent docs (2026-06)openrouter deepseek/deepseek-v4-flashruby@3.x (YAML stdlib)Side effects & data flow
- Network
- none, local only
- Writes
- ./config.yaml
- Credentials
- none required
Prerequisites
- Hermes Agent installed
- An OpenRouter API key (only to actually run the agent)
Steps
- 1
Set the model and a default reasoning effort, then validate
Write config.yaml with the deepseek/deepseek-v4-flash slug and agent.reasoning_effort: high (Hermes' default is medium). At runtime you switch per task with /reasoning xhigh or /reasoning none, no restart. CI parses the YAML and asserts reasoning_effort is a valid level.
cat > config.yaml <<'YAML' model: provider: openrouter model: deepseek/deepseek-v4-flash agent: reasoning_effort: high YAML ruby -ryaml -e ' allowed = ["none","minimal","low","medium","high","xhigh"] c = YAML.load_file("config.yaml") m = c["model"] || {}; a = c["agent"] || {} abort "BAD: model not deepseek/deepseek-v4-flash" unless m["model"] == "deepseek/deepseek-v4-flash" re = a["reasoning_effort"].to_s abort "BAD: reasoning_effort not a valid level: #{re}" unless allowed.include?(re) puts "config OK: deepseek/deepseek-v4-flash + reasoning_effort=" + re + " is a valid level" ' - 2
Throttle per task at runtime (the model step, not checked by CI)
/reasoning xhigh for the hard one, /reasoning none for a quick lookup. xhigh can multiply output tokens, so use it deliberately; keep stable prefixes consistent to hit DeepSeek's cached-input discount. The reasoning itself is fenced.
Eval, 2 fixtures
Last passed: verified todayconfig-okcontainstimeout 30s · max $0Expected:
config OK: deepseek/deepseek-v4-flash + reasoning_effort=high is a valid levelclean-exitexit_codetimeout 30s · max $0Expected:
0
Results
DeepSeek V4 Flash is $0.098 in / $0.196 out per 1M tokens on OpenRouter (1M context). Reasoning effort is mostly output tokens, so the throttle is your biggest lever on the bill: run 'high' by default, push to 'xhigh' only when earned.
Did this work for you?
Our CI checks the setup runs. You tell us if the whole thing worked. Tell us straight.
Liked this workflow?
Get new verified workflows in WebAfterAI, three issues a week (Tue, Thu, Sat).