AgentsOpen SourceFreeActiveMachine-verified· intermediate · ~15 min setup

DSPy: program the pipeline, compile the prompts (stop hand-tuning)

Define an agent step as a DSPy program with a signature, a module, and a metric, so an optimizer improves the prompts against your metric instead of you fiddling by hand.

by Shilpa Mitra· verified today· v1.0.0

Run this workflow

CI-verified, 2/2 fixtures passing.

Build this with your agent

One copy-paste hands Claude Code, Codex, or Cursor the full recipe, steps included, nothing to fetch.

Intended Use

Anyone whose pipeline keeps breaking on prompt wording. CI compiles the DSPy program (python3 py_compile, no dspy install needed) and asserts it defines a Signature, a Module, a metric function, and an optimizer. No keys, no model call. The optimizer run is fenced (it costs compute and tokens).

Not for

Adopting it with no eval data, the optimizer needs a real metric and examples to work against; without those it does nothing for you
A drop-in prompt fix, it is a mindset shift (program, then compile); reach for it when brittleness is the bottleneck

The Stack

DSPyprompt optimization

Tested Against

dspy docs (2026-06)python@3.12 (py_compile, stdlib)

Side effects & data flow

Network: none, local only
Writes: ./program.py
Credentials: none required

Prerequisites

pip install dspy + a model key and eval examples (only to actually compile/optimize)

Steps

Write the DSPy program and structure-check it

Write program.py: a Signature (the I/O contract), a Module (the pipeline), a metric function, and an optimizer. CI compiles it and asserts all four are present. Running the optimizer needs dspy, a key, and examples, so that step is fenced.

cat > program.py <<'PY'
import dspy


class QA(dspy.Signature):
    "Answer the question concisely."
    question = dspy.InputField()
    answer = dspy.OutputField()


class Pipeline(dspy.Module):
    def __init__(self):
        super().__init__()
        self.gen = dspy.ChainOfThought(QA)

    def forward(self, question):
        return self.gen(question=question)


def metric(example, pred, trace=None):
    return example.answer.lower() in pred.answer.lower()


optimizer = dspy.BootstrapFewShot(metric=metric)
PY
python3 - <<'CHECK'
import py_compile, sys
src = open("program.py").read()
try:
    py_compile.compile("program.py", doraise=True)
except py_compile.PyCompileError:
    print("BAD: program.py does not compile"); sys.exit(1)
def need(tok, msg):
    if tok not in src:
        print("BAD: " + msg); sys.exit(1)
need("dspy.Signature", "no DSPy Signature")
need("dspy.Module", "no DSPy Module")
need("def metric", "no metric function for the optimizer")
if "Bootstrap" not in src and "MIPRO" not in src and "optimizer" not in src:
    print("BAD: no optimizer (the prompts are not being compiled)"); sys.exit(1)
print("config OK: DSPy program compiles with a Signature, a Module, a metric, and an optimizer (prompts compiled, not hand-tuned)")
CHECK

2
Compile / optimize (the model step, not checked by CI)
pip install dspy, supply a model key and a handful of labeled examples, and run the optimizer. It searches prompts and few-shot demos against your metric. This costs tokens and is fenced.

Eval, 2 fixtures

Last passed: verified today

program-okcontainstimeout 30s · max $0
Expected: config OK: DSPy program compiles with a Signature, a Module, a metric, and an optimizer (prompts compiled, not hand-tuned)
clean-exitexit_codetimeout 30s · max $0
Expected: 0

Results

DSPy makes prompt quality something you measure and improve, not something you tweak at 1am. You define steps and a metric; DSPy optimizes the prompts and few-shot examples against that metric. It pays off only when prompt brittleness is your real bottleneck and you have an eval metric + examples.

Did this work for you?

Our CI checks the setup runs. You tell us if the whole thing worked. Tell us straight.

Related workflows

Liked this workflow?

Get new verified workflows in WebAfterAI, three issues a week (Tue, Thu, Sat).