Local model chore: summarize a long PDF without it leaving your laptop
Attach a 30-page PDF or a dense terms-of-service to a local model and get five plain bullets plus anything you need to act on, with the document staying on your machine.
Run this workflow
CI-verified, 2/2 fixtures passing.
Build this with your agent
One copy-paste hands Claude Code, Codex, or Cursor the full recipe, steps included, nothing to fetch.
Intended Use
Anyone who needs to summarize a private document offline. The deterministic spine that feeds the model is document text-extraction: CI builds a fixture PDF and verifies pypdf recovers its exact text (the step that lets a local model 'read' an attached file). The summarization itself is the model step and is fenced. No key, no cloud.
Not for
- Scanned/image-only PDFs, those need OCR first (text extraction returns nothing)
- Expecting CI to grade the summary, it verifies the text-extraction the summary depends on
The Stack
Tested Against
pypdf@latestreportlab@latestpython@3.12ollama + gemma3 for the fenced summarySide effects & data flow
- Network
- PyPI, install only
- Writes
- ./.venv/, ./doc.pdf
- Credentials
- none required
Data privacy
- nobody (fully local) ← the document text (retention: extraction and the local-model summary run on-device; the file never leaves the laptop)
Prerequisites
- A laptop with ~8GB RAM
- Ollama or LM Studio for the summary step
Steps
- 1
Extract the document text that feeds the model (deterministic)
When you 'attach' a PDF, the app first extracts its text to hand to the model. CI builds a small fixture PDF with reportlab, then verifies pypdf recovers the exact content, the real, deterministic ingestion step. No model, no key.
python3 -m venv .venv .venv/bin/pip install -q reportlab pypdf .venv/bin/python - <<'EOF' from reportlab.pdfgen import canvas from pypdf import PdfReader c = canvas.Canvas("doc.pdf") y = 800 for line in [ "QUARTERLY REPORT", "Revenue grew 18 percent to 2.4 million dollars.", "Action item: renew the SOC2 audit before September.", ]: c.drawString(72, y, line); y -= 24 c.save() text = "".join((p.extract_text() or "") for p in PdfReader("doc.pdf").pages) assert "Revenue grew 18 percent" in text, f"extraction missed the revenue line: {text!r}" assert "renew the SOC2 audit" in text, f"extraction missed the action item: {text!r}" print("chore2 OK: pypdf recovered the document text that feeds the local model (revenue + action item)") EOF - 2
Summarize the extracted text on a local model (the model step, not checked by CI)
Pipe the extracted text into Ollama or attach the PDF in LM Studio / Open WebUI and ask for 'five plain bullets and anything I need to act on.' The summary runs the model and is non-deterministic, so CI never claims it.
Eval, 2 fixtures
Last passed: verified todayextractedcontainstimeout 600s · max $0Expected:
chore2 OK: pypdf recovered the document text that feeds the local model (revenue + action item)clean-exitexit_codetimeout 600s · max $0Expected:
0
Results
Get the gist of a confidential document in seconds, and because it is local, the file never leaves your laptop. Works with the document upload in LM Studio or Open WebUI.
Did this work for you?
Our CI checks the setup runs. You tell us if the whole thing worked. Tell us straight.
Liked this workflow?
Get new verified workflows in WebAfterAI, three issues a week (Tue, Thu, Sat).