RAGOpen SourceFreeActiveMachine-verified· beginner · ~10 min setup

LlamaIndex: index your documents and query them at runtime

Point LlamaIndex at a document corpus and build a VectorStoreIndex so an agent can retrieve the relevant chunks at query time instead of stuffing everything into context.

by Shilpa Mitra· verified today· v1.0.0

Run this workflow

CI-verified, 2/2 fixtures passing.

Build this with your agent

One copy-paste hands Claude Code, Codex, or Cursor the full recipe, steps included, nothing to fetch.

Intended Use

Anyone whose knowledge corpus is too large or too changeable to keep in a prompt. CI installs llama-index-core and verifies the three core retrieval abstractions import cleanly: VectorStoreIndex, SimpleDirectoryReader, and StorageContext. Building an index and querying it need an embedding model and an LLM (OpenAI by default), so those steps are fenced.

Not for

Static knowledge that fits in a prompt, RAG adds latency and a retrieval layer for no gain when the knowledge is small
Expecting CI to verify retrieval quality, that depends on chunking, embedding, and query — the fenced model steps
Fully offline use without swapping the default embedding/LLM backends

The Stack

LlamaIndexretrieval memory (RAG)

Tested Against

llama-index-core@latestpython@3.12

Side effects & data flow

Network: PyPI, install only
Writes: ./.venv/
Credentials: Embedding + LLM key, for the fenced index/query steps only

Prerequisites

Python 3.10+
pip
An embedding model + LLM key for the fenced index/query steps (OpenAI by default)

Steps

Install llama-index-core and verify the retrieval abstractions import

pip install llama-index-core, then confirm the three abstractions the docs build on: VectorStoreIndex (the index), SimpleDirectoryReader (the loader), and StorageContext (the persistence layer). CI runs exactly this, no key.

python3 -m venv .venv
.venv/bin/pip install -q llama-index-core
.venv/bin/python - <<'EOF'
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext

for cls in [VectorStoreIndex, SimpleDirectoryReader, StorageContext]:
    assert callable(cls), f"{cls.__name__} is not callable"
print("llamaindex imports OK: VectorStoreIndex, SimpleDirectoryReader, StorageContext all available")
EOF

2
Build the index and query it (the model step, not checked by CI)
Set your embedding and LLM keys, load your corpus with SimpleDirectoryReader, call VectorStoreIndex.from_documents(), then index.as_query_engine().query(). Persist with storage_context if you want the index to survive restarts. The embedding and retrieval are fenced.

Eval, 2 fixtures

Last passed: verified today

imports-okcontainstimeout 900s · max $0
Expected: llamaindex imports OK: VectorStoreIndex, SimpleDirectoryReader, StorageContext all available
clean-exitexit_codetimeout 900s · max $0
Expected: 0

Results

The standard RAG pattern with 40+ integrations and pluggable embedding and vector-store backends. Retrieval memory is the right pick when the knowledge corpus is too large or too changeable to bake into the model.

Did this work for you?

Our CI checks the setup runs. You tell us if the whole thing worked. Tell us straight.

Related workflows

Liked this workflow?

Get new verified workflows in WebAfterAI, three issues a week (Tue, Thu, Sat).