Open SourceLibraryFree

MinerU

OpenDataLab document parser that turns PDF, DOCX, PPTX, XLSX, and images into clean, LLM-ready markdown and JSON with strong OCR, the fix for garbage RAG inputs. Fine print: it moved off AGPLv3 to a custom 'MinerU Open Source License' (Apache-2.0 based, with added commercial and attribution conditions), so read the terms before a commercial build.

1 workflow use MinerU