AgentsOpen SourceFreeActiveMachine-verified· beginner · ~5 min setup

Hermes + DeepSeek V4 Flash: a one-line reasoning-effort throttle

Run one model from cheap-and-fast to deep-and-careful with a single reasoning_effort setting, so you don't pay for deep thinking on easy turns.

by Shilpa Mitra· verified today· v1.0.0

Run this workflow

CI-verified, 2/2 fixtures passing.

Build this with your agent

One copy-paste hands Claude Code, Codex, or Cursor the full recipe, steps included, nothing to fetch.

Intended Use

Anyone who wants to dial reasoning effort per task on a cheap DeepSeek V4 Flash setup. CI validates that config.yaml parses, names the deepseek/deepseek-v4-flash slug, and that agent.reasoning_effort is one of Hermes' allowed levels (none, minimal, low, medium, high, xhigh). No key. The model's actual reasoning is fenced.

Not for

Treating the leaderboard's 'Max' and 'High' as separate models, they are xhigh and high effort on the same model
Expecting CI to measure token spend, that is the fenced model step

The Stack

DeepSeek V4model (DeepSeek V4 Flash)Hermes Agentagent runtime

Tested Against

hermes-agent docs (2026-06)openrouter deepseek/deepseek-v4-flashruby@3.x (YAML stdlib)

Side effects & data flow

Network: none, local only
Writes: ./config.yaml
Credentials: none required

Prerequisites

Hermes Agent installed
An OpenRouter API key (only to actually run the agent)

Steps

Set the model and a default reasoning effort, then validate

Write config.yaml with the deepseek/deepseek-v4-flash slug and agent.reasoning_effort: high (Hermes' default is medium). At runtime you switch per task with /reasoning xhigh or /reasoning none, no restart. CI parses the YAML and asserts reasoning_effort is a valid level.

cat > config.yaml <<'YAML'
model:
  provider: openrouter
  model: deepseek/deepseek-v4-flash
agent:
  reasoning_effort: high
YAML
ruby -ryaml -e '
allowed = ["none","minimal","low","medium","high","xhigh"]
c = YAML.load_file("config.yaml")
m = c["model"] || {}; a = c["agent"] || {}
abort "BAD: model not deepseek/deepseek-v4-flash" unless m["model"] == "deepseek/deepseek-v4-flash"
re = a["reasoning_effort"].to_s
abort "BAD: reasoning_effort not a valid level: #{re}" unless allowed.include?(re)
puts "config OK: deepseek/deepseek-v4-flash + reasoning_effort=" + re + " is a valid level"
'

2
Throttle per task at runtime (the model step, not checked by CI)
/reasoning xhigh for the hard one, /reasoning none for a quick lookup. xhigh can multiply output tokens, so use it deliberately; keep stable prefixes consistent to hit DeepSeek's cached-input discount. The reasoning itself is fenced.

Eval, 2 fixtures

Last passed: verified today

config-okcontainstimeout 30s · max $0
Expected: config OK: deepseek/deepseek-v4-flash + reasoning_effort=high is a valid level
clean-exitexit_codetimeout 30s · max $0
Expected: 0

Results

DeepSeek V4 Flash is $0.098 in / $0.196 out per 1M tokens on OpenRouter (1M context). Reasoning effort is mostly output tokens, so the throttle is your biggest lever on the bill: run 'high' by default, push to 'xhigh' only when earned.

Did this work for you?

Our CI checks the setup runs. You tell us if the whole thing worked. Tell us straight.

Related workflows

Liked this workflow?

Get new verified workflows in WebAfterAI, three issues a week (Tue, Thu, Sat).