🌏 中文版
RAG systems typically index long documents by splitting them into small chunks, embedding each one, and storing them in a vector database. This approach has a fundamental flaw: once split, each chunk loses the context it had within the original document.
Consider a route description document that contains:
- Section 1: Route overview (location, difficulty, type)
- Section 2: Key technical moves
- Section 3: Safety notes and recommended gear
If you store only the Section 2 chunk — “The crux move comes after the third bolt; it requires precise balance and footwork” — it’s completely adrift without context. There’s no mention of which route, which crag, or what grade. When a search hits that chunk, the LLM receives it without the information it actually needs.
Contextual Retrieval (introduced by Anthropic) fixes this: at index time, inject a document-level summary into every chunk.
Design
The indexing pipeline goes beyond a simple chunk → embedding step:
Document
↓
[Document Summary] ← LLM generates a 2–3 sentence summary
↓
[Chunk Splitting] ← split into smaller segments
↓
Inject summary into every chunk:
context = "[Document summary: {summary}]\n\n{chunk_content}"
↓
[Embedding]
↓
[Vector Store]
When a search retrieves a chunk, it now carries document-level context. Even when the LLM sees only a small passage, it understands where that passage came from and what broader topic it belongs to.
Prompt Design
Document summary generation:
Please write a concise 2–3 sentence summary of the following climbing content,
capturing the most important information (crag, grade, style, etc.).
This summary will be prepended to every segment of the document to aid
semantic understanding during search.
Content: {document_content}
Summary (2–3 sentences):
Example generated summary:
Longdong North Wall route, grade 5.11a, sport climbing, located in Gongliao District, New Taipei City.
The route is known for its technical difficulty, demanding good footwork and balance.
Suitable for intermediate-to-advanced climbers; protection is adequate but bolts are spaced.
The chunk after injection:
[Document summary: Longdong North Wall route, grade 5.11a, sport climbing, located in Gongliao District, New Taipei City.
The route is known for its technical difficulty, demanding good footwork and balance. Suitable for intermediate-to-advanced climbers; protection is adequate but bolts are spaced.]
The crux comes after the third bolt — balance and footwork must work together; it's worth resting fully before committing to this section...
Now this chunk makes sense on its own. Both the search engine and the LLM can understand its context even in isolation.
Async Execution
Summary generation does not sit on the main query path — it happens when a document is added to the index, using Cloudflare Workers’ ctx.waitUntil():
async function indexDocument(doc: Document, env: Env, ctx: ExecutionContext) {
// Synchronous: basic document embedding (immediately searchable)
await embedAndStore(doc);
// Async: generate contextual embeddings and upgrade the index (non-blocking)
ctx.waitUntil(
generateContextualEmbeddings(doc, env)
);
}
This way the document is searchable immediately via its basic embedding, and the contextual version upgrades it in the background — no downtime, no full re-index required.
Results
Anthropic’s research shows that Contextual Retrieval improves top-20 recall for chunk search by 49% (combined with BM25), and by as much as 67% on certain test sets.
In a climbing context, the gains are especially pronounced: many route-information chunks are inherently short (“the crux of the third pitch is…”), and they’re nearly meaningless without context. Once you inject the crag name, grade, and route style, the relevance of those same chunks improves dramatically.
Cost Considerations
Each document requires one additional LLM call to generate its summary. For large document collections, indexing costs will increase. Several mitigations:
- Incremental indexing: only regenerate summaries for new or modified documents
- Batch processing: run updates during off-peak hours overnight
- Summary caching: if a document hasn’t changed, reuse its existing summary
The significant improvement in search quality makes this indexing overhead worthwhile.
The Bottom Line
Contextual Retrieval addresses a root-level problem in RAG systems: chunking destroys context. This problem is far more effectively solved at the indexing stage than patched later during search or generation — improving data quality at the source beats algorithmic workarounds downstream.
“Garbage in, garbage out” is the most common cause of RAG system failures. Contextual Retrieval ensures that every chunk entering the index is a meaningful unit of information, not just a truncated fragment of text.
References
- Contextual Retrieval — Anthropic
- Introducing Contextual Retrieval (Anthropic Engineering)
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (arXiv)
- Full text queries (Elasticsearch / BM25 hybrid search)
- iThome — Contextual Retrieval article
- NobodyClimb Architecture: Full-Stack Climbing Community on Cloudflare (zh-TW only)
- NobodyClimb AI Architecture: 20-Node RAG Pipeline (zh-TW only)
Loading...