Local InferenceOpen SourceFreeActiveMachine-verified· beginner · ~10 min setup

Local model chore: summarize a long PDF without it leaving your laptop

Attach a 30-page PDF or a dense terms-of-service to a local model and get five plain bullets plus anything you need to act on, with the document staying on your machine.

by Shilpa Mitra· verified today· v1.0.0

Run this workflow

CI-verified, 2/2 fixtures passing.

Build this with your agent

One copy-paste hands Claude Code, Codex, or Cursor the full recipe, steps included, nothing to fetch.

Intended Use

Anyone who needs to summarize a private document offline. The deterministic spine that feeds the model is document text-extraction: CI builds a fixture PDF and verifies pypdf recovers its exact text (the step that lets a local model 'read' an attached file). The summarization itself is the model step and is fenced. No key, no cloud.

Not for

Scanned/image-only PDFs, those need OCR first (text extraction returns nothing)
Expecting CI to grade the summary, it verifies the text-extraction the summary depends on

The Stack

Google Gemma 3model (Gemma 3)Ollamalocal runtime (summary step)

Tested Against

pypdf@latestreportlab@latestpython@3.12ollama + gemma3 for the fenced summary

Side effects & data flow

Network: PyPI, install only
Writes: ./.venv/, ./doc.pdf
Credentials: none required

Data privacy

nobody (fully local) ← the document text (retention: extraction and the local-model summary run on-device; the file never leaves the laptop)

Prerequisites

A laptop with ~8GB RAM
Ollama or LM Studio for the summary step

Steps

Extract the document text that feeds the model (deterministic)

When you 'attach' a PDF, the app first extracts its text to hand to the model. CI builds a small fixture PDF with reportlab, then verifies pypdf recovers the exact content, the real, deterministic ingestion step. No model, no key.

python3 -m venv .venv
.venv/bin/pip install -q reportlab pypdf
.venv/bin/python - <<'EOF'
from reportlab.pdfgen import canvas
from pypdf import PdfReader

c = canvas.Canvas("doc.pdf")
y = 800
for line in [
    "QUARTERLY REPORT",
    "Revenue grew 18 percent to 2.4 million dollars.",
    "Action item: renew the SOC2 audit before September.",
]:
    c.drawString(72, y, line); y -= 24
c.save()

text = "".join((p.extract_text() or "") for p in PdfReader("doc.pdf").pages)
assert "Revenue grew 18 percent" in text, f"extraction missed the revenue line: {text!r}"
assert "renew the SOC2 audit" in text, f"extraction missed the action item: {text!r}"
print("chore2 OK: pypdf recovered the document text that feeds the local model (revenue + action item)")
EOF

2
Summarize the extracted text on a local model (the model step, not checked by CI)
Pipe the extracted text into Ollama or attach the PDF in LM Studio / Open WebUI and ask for 'five plain bullets and anything I need to act on.' The summary runs the model and is non-deterministic, so CI never claims it.

Eval, 2 fixtures

Last passed: verified today

extractedcontainstimeout 600s · max $0
Expected: chore2 OK: pypdf recovered the document text that feeds the local model (revenue + action item)
clean-exitexit_codetimeout 600s · max $0
Expected: 0

Results

Get the gist of a confidential document in seconds, and because it is local, the file never leaves your laptop. Works with the document upload in LM Studio or Open WebUI.

Did this work for you?

Our CI checks the setup runs. You tell us if the whole thing worked. Tell us straight.

Related workflows

Liked this workflow?

Get new verified workflows in WebAfterAI, three issues a week (Tue, Thu, Sat).