🌏 中文版
RAG systems evolved rapidly between 2023 and 2026, growing from a simple three-step process into sophisticated modular pipelines. Understanding that evolutionary arc makes it much easier to see what problem each design decision was actually solving.
Naive RAG (First Generation)
The most basic RAG flow:
1. Indexing: Document → Chunk → Embed → Vector DB
2. Retrieval: Query → Embed → Vector Search → Top-K Chunks
3. Generation: Query + Top-K Chunks → LLM → Answer
Simple to implement, and it was essentially the blueprint for every RAG tutorial in early 2023. Early versions of LangChain were essentially a packaged Naive RAG.
Problems with Naive RAG:
Retrieval issues:
- Mismatch between query and document language patterns (questions vs. declarative statements)
- A single vector representing an entire passage compresses too much semantic information
- Polysemous words and domain-specific terms cause confusion
Precision issues:
- Irrelevant documents in the Top-K results create noise for the LLM
- Duplicate chunks waste context window space
- No relevance-based reranking
Generation issues:
- Context that is too long or too short
- The LLM has no way to assess the relative reliability of documents
- No mechanism to evaluate answer quality
These problems are barely noticeable in simple scenarios, but they significantly degrade quality in specialized vertical domains — climbing, medicine, law.
Advanced RAG (Second Generation)
Advanced RAG addresses each of Naive RAG’s failure modes with improvements at every stage:
Pre-retrieval (improving the query itself):
- HyDE: Transform the question into a hypothetical answer document to bridge the language-pattern gap
- Multi-Query Expansion: Generate multiple perspectives from a single question to improve recall
- Query Rewriting: Rewrite colloquial queries into a form better suited for search
Retrieval (improving search quality):
- Hybrid Search: BM25 + vector search, complementary coverage
- Metadata Filtering: Narrow the search scope to improve precision
- Contextual Retrieval: Inject context at indexing time to solve the isolated-chunk problem
Post-retrieval (improving the context sent to the LLM):
- Cross-Encoder Reranking: Fine-grained reranking to eliminate irrelevant documents
- MMR (Maximal Marginal Relevance): Diversity-based selection to avoid redundancy
- Context Compression: Compress context while retaining key information
Advanced RAG is patch-on-patch on top of the original three-step flow. Each patch fixes a specific problem, but the overall design is still linear and fixed.
Modular RAG (Third Generation)
The core limitation of Advanced RAG: different queries need different processing paths, but a fixed pipeline cannot adapt dynamically to query type.
Modular RAG’s redesign:
Core concept: Extract each RAG capability into an independent module, then use a Pipeline Engine to compose them dynamically.
Instead of: Query → Step1 → Step2 → ... → StepN → Answer
Do this: Query
↓
[Query Classification] → determines routing
↓
Dynamically select and execute relevant modules:
- Simple query: skip most modules
- Complex query: full module stack
- SQL query: route to Text-to-SQL path
- Agentic query: enter ReAct loop
Three defining properties:
- Composable: Modules can be freely combined; a new module only needs to implement a fixed interface
- Configurable: An admin UI can dynamically enable/disable modules and tune parameters
- Routable: The system automatically selects the optimal execution path based on query type
New module types:
Modular RAG doesn’t just reorganize the existing steps — it introduces capabilities that Advanced RAG never had:
- Query Router: Selects a processing strategy based on intent
- Self-Reflection: Automatically regenerates when output quality falls below threshold
- LLM-as-Judge: Evaluates output quality
- Agentic Loop: The LLM actively decides whether it needs more information
- Semantic Cache: Caches responses for semantically similar queries
- Memory: Remembers user preferences for personalized responses
Comparing the Three Generations
| Naive RAG | Advanced RAG | Modular RAG | |
|---|---|---|---|
| Architecture | Linear 3-step | Enhanced linear flow | Modular DAG |
| Flexibility | Low | Medium | High |
| Maintainability | Simple | Moderate | Complex but organized |
| Configuration | Hardcoded | Partially configurable | Dynamically configurable |
| Adapts to query type | No | Limited | Full support |
| Engineering cost | Low | Medium | High |
Which Generation Should You Use?
Naive RAG: Great for PoCs, internal tools, and single-query-type scenarios. Ship fast, validate whether RAG actually adds value, then optimize.
Advanced RAG: The right choice for most production use cases. When you hit a specific quality problem, add the corresponding fix: poor recall → HyDE/Multi-Query; poor precision → Reranking; poor generation → Judge.
Modular RAG: Built for systems with diverse query types and a need for continuous iteration. High upfront engineering cost, but strong long-term maintainability. You need clear quality metrics and an iteration plan to get the most out of the modular design.
The Bigger Picture
The three generations aren’t “new replaces old” — they’re solutions at different levels of complexity. The core ideas from Naive RAG are still alive inside Modular RAG; they’re just better organized.
Understanding what each generation solved — and what complexity it introduced — is what lets you find the right balance between “good enough” and over-engineering.
References
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (2020)
- Modular RAG: Transforming RAG Systems into LEGO-like Reconfigurable Frameworks (2024)
- Retrieval-Augmented Generation for Large Language Models: A Survey (2023)
- NobodyClimb System Architecture: A Full-Stack Climbing Community on Cloudflare (zh-TW only)
- NobodyClimb AI Architecture: A 20-Node RAG Pipeline (zh-TW only)
Loading...