DSPy: program the pipeline, compile the prompts (stop hand-tuning)
Define an agent step as a DSPy program with a signature, a module, and a metric, so an optimizer improves the prompts against your metric instead of you fiddling by hand.
Run this workflow
CI-verified, 2/2 fixtures passing.
Build this with your agent
One copy-paste hands Claude Code, Codex, or Cursor the full recipe, steps included, nothing to fetch.
Intended Use
Anyone whose pipeline keeps breaking on prompt wording. CI compiles the DSPy program (python3 py_compile, no dspy install needed) and asserts it defines a Signature, a Module, a metric function, and an optimizer. No keys, no model call. The optimizer run is fenced (it costs compute and tokens).
Not for
- Adopting it with no eval data, the optimizer needs a real metric and examples to work against; without those it does nothing for you
- A drop-in prompt fix, it is a mindset shift (program, then compile); reach for it when brittleness is the bottleneck
The Stack
Tested Against
dspy docs (2026-06)python@3.12 (py_compile, stdlib)Side effects & data flow
- Network
- none, local only
- Writes
- ./program.py
- Credentials
- none required
Prerequisites
- pip install dspy + a model key and eval examples (only to actually compile/optimize)
Steps
- 1
Write the DSPy program and structure-check it
Write program.py: a Signature (the I/O contract), a Module (the pipeline), a metric function, and an optimizer. CI compiles it and asserts all four are present. Running the optimizer needs dspy, a key, and examples, so that step is fenced.
cat > program.py <<'PY' import dspy class QA(dspy.Signature): "Answer the question concisely." question = dspy.InputField() answer = dspy.OutputField() class Pipeline(dspy.Module): def __init__(self): super().__init__() self.gen = dspy.ChainOfThought(QA) def forward(self, question): return self.gen(question=question) def metric(example, pred, trace=None): return example.answer.lower() in pred.answer.lower() optimizer = dspy.BootstrapFewShot(metric=metric) PY python3 - <<'CHECK' import py_compile, sys src = open("program.py").read() try: py_compile.compile("program.py", doraise=True) except py_compile.PyCompileError: print("BAD: program.py does not compile"); sys.exit(1) def need(tok, msg): if tok not in src: print("BAD: " + msg); sys.exit(1) need("dspy.Signature", "no DSPy Signature") need("dspy.Module", "no DSPy Module") need("def metric", "no metric function for the optimizer") if "Bootstrap" not in src and "MIPRO" not in src and "optimizer" not in src: print("BAD: no optimizer (the prompts are not being compiled)"); sys.exit(1) print("config OK: DSPy program compiles with a Signature, a Module, a metric, and an optimizer (prompts compiled, not hand-tuned)") CHECK - 2
Compile / optimize (the model step, not checked by CI)
pip install dspy, supply a model key and a handful of labeled examples, and run the optimizer. It searches prompts and few-shot demos against your metric. This costs tokens and is fenced.
Eval, 2 fixtures
Last passed: verified todayprogram-okcontainstimeout 30s · max $0Expected:
config OK: DSPy program compiles with a Signature, a Module, a metric, and an optimizer (prompts compiled, not hand-tuned)clean-exitexit_codetimeout 30s · max $0Expected:
0
Results
DSPy makes prompt quality something you measure and improve, not something you tweak at 1am. You define steps and a metric; DSPy optimizes the prompts and few-shot examples against that metric. It pays off only when prompt brittleness is your real bottleneck and you have an eval metric + examples.
Did this work for you?
Our CI checks the setup runs. You tell us if the whole thing worked. Tell us straight.
Related workflows
- promptfoo: make agent evals fail the build, not the user
- E2B: run model-written code in a sandbox, not on your box
- Write an agent loop in code with smolagents (sandboxed)
- Hermes /learn: author a reusable skill from a source, not by hand
- Text your own AI assistant on WhatsApp: Hermes wired to FreeLLMAPI
- FreeLLMAPI: one socket, sixteen free model tiers with auto-fallback
Liked this workflow?
Get new verified workflows in WebAfterAI, three issues a week (Tue, Thu, Sat).