CodingOpen SourceFreeActiveMachine-verified· intermediate · ~10 min setup

Self-Healing Test Loop: an Agent that Fixes Until Green

Wrap a coding agent in a bounded loop that re-runs your tests and fixes the code until the suite passes or it runs out of attempts.

by Shilpa Mitra· verified today· v1.0.0

Run this workflow

CI-verified, 4/4 fixtures passing.

Intended Use

Anyone who wants a coding agent to repair failing tests unattended. CI verifies the loop's control logic: it stops on the first passing verify and is bounded by a MAX attempt count (the two guardrails that keep a loop safe). The agent step (claude -p or codex exec) is model-driven and fenced. Works with both Claude Code and Codex.

Not for

  • Letting the agent edit the tests, tell it to fix the code not the tests, or the fastest green is a deleted assertion
  • Unbounded runs, always cap attempts so a stubborn problem cannot spin and spend forever
  • Full access by default, give the least permission that works (Read, Edit, Bash for this one)

The Stack

Tested Against

claude-code@2.1.xcodex@latestnode@20.x

Side effects & data flow

Network
none, local only
Writes
./checkloop.mjs
Credentials
none required

Prerequisites

  • Claude Code (claude -p) or Codex (codex exec)
  • A real test command as the verifier (npm test, pytest, etc.)
  • An API key in CI: ANTHROPIC_API_KEY for Claude Code, CODEX_API_KEY for Codex

Steps

  1. 1

    The shape: agent, verifier, bounded stop

    The loop runs your tests; if they fail it pipes the failure to the agent to fix the code, and repeats until green or MAX attempts. Claude Code: pipe the output to claude -p with --allowedTools Read,Edit,Bash and --permission-mode acceptEdits. Codex: pipe to codex exec --sandbox workspace-write (the current flag; --full-auto is deprecated). The verifier is your real test command, the most trustworthy judge there is.

  2. 2

    What CI checks: the loop stops on pass and is bounded

    CI models the loop's control flow with a stub agent (standing in for the fenced claude -p / codex exec) and asserts the two guardrails the newsletter insists on: the loop stops the instant the verifier passes, and a never-fixed problem stops after MAX instead of spinning forever. No model, no key.

    cat > checkloop.mjs <<'EOF'
    // Model the self-healing loop's control flow and assert its two guardrails:
    // it stops on the first passing verify, and it is bounded by MAX.
    const MAX = 5;
    function runLoop(fixer){
      let state = 'RED';
      const verify = () => state === 'GREEN';
      for (let i = 1; i <= MAX; i++){
        if (verify()) return { passed: true, attempt: i };
        state = fixer(i, state);
      }
      return { passed: false, attempt: MAX };
    }
    const fixNow = () => 'GREEN';   // stub agent that fixes on its first turn
    const neverFix = (i, s) => s;   // stub agent that never fixes
    let ok = true;
    function check(label, cond){ console.log(label + ': ' + (cond ? 'yes' : 'NO')); if(!cond) ok=false; }
    const a = runLoop(fixNow);
    check('loop stops on first passing verify (attempt 2)', a.passed === true && a.attempt === 2);
    const b = runLoop(neverFix);
    check('loop is bounded: stops after MAX with no infinite spin', b.passed === false && b.attempt === MAX);
    if(!ok){ console.log('self-healing loop check FAILED'); process.exit(1); }
    console.log('self-healing loop check OK');
    EOF
    node checkloop.mjs
  3. 3

    Run it with your agent (the model step, not checked by CI)

    Point the loop at your repo and let claude -p or codex exec do the fixing each iteration. That call runs a model, so it is non-deterministic and fenced: CI proves the loop is bounded and stops on success, never the model's edits. Tell it to fix the code, not the tests.

Eval, 4 fixtures

Last passed: verified today
  • stops-on-passcontainstimeout 30s · max $0

    Expected: loop stops on first passing verify (attempt 2): yes

  • boundedcontainstimeout 30s · max $0

    Expected: loop is bounded: stops after MAX with no infinite spin: yes

  • check-okcontainstimeout 30s · max $0

    Expected: self-healing loop check OK

  • clean-exitexit_codetimeout 30s · max $0

    Expected: 0

Results

The smallest loop with the clearest payoff: watch a red suite go green with no one typing.

Did this work for you?

Our CI checks the setup runs. You tell us if the whole thing worked. Tell us straight.

Liked this workflow?

Get new verified workflows in WebAfterAI, three issues a week (Tue, Thu, Sat).