🌏 中文版
Standard RAG is a single-pass pipeline: query → retrieve → generate. That works fine for most questions, but falls apart when a query requires multi-hop reasoning.
“Plan me a climbing trip leaving from Taichung — intermediate level, doable on a weekend, with routes at different grades so everyone in the group can climb.”
This question spans several dimensions:
- What climbing crags are near Taichung? (geography)
- What grade distribution does each crag offer? (route info)
- What grade range counts as intermediate? (skill assessment)
- How accessible are they on weekends? (logistics)
One search can’t cover all of that at once. Agentic RAG lets the LLM evaluate whether the current context is sufficient during execution — and if it isn’t, decide what to search for next.
The ReAct Loop
ReAct (Reasoning + Acting) is the core pattern behind Agentic RAG:
Reason: Evaluate current context, decide next step
Act: Execute the decision (search / answer / broaden)
Observe: Receive search results, update context
Reason: Re-evaluate... (loop)
Here’s how it looks in the implementation:
async function agenticRetrieve(ctx: PipelineContext): Promise<void> {
let step = 0;
while (step < ctx.config.agentic_max_steps) {
const candidates = ctx.candidateMatches;
// Check if we have enough
if (candidates.length >= ctx.config.agentic_min_docs_to_answer) {
ctx.agenticDecision = 'ANSWER';
break;
}
// LLM decides: rewrite query / broaden filter
const decision = await agentDecide(ctx.currentQuery, candidates, ctx.config);
if (decision.action === 'RETRIEVE') {
// Re-run search with rewritten query
ctx.currentQuery = decision.rewrittenQuery;
const newResults = await hybridSearch(ctx);
mergeResults(ctx, newResults);
} else if (decision.action === 'BROADEN') {
// Relax filter constraints
ctx.vectorFilter = relaxFilter(ctx.vectorFilter);
const newResults = await hybridSearch(ctx);
mergeResults(ctx, newResults);
} else {
break; // ANSWER
}
step++;
}
}
agentic_max_steps prevents infinite loops. The default is 3 steps and can be tuned.
The Decision Prompt
The agent’s decision LLM receives:
Current query: {query}
Documents found so far ({n}): {document_summaries}
Choose one:
ANSWER — Context is sufficient; generate a response
RETRIEVE — More information needed; rewrite the query (provide new query)
BROADEN — Filter constraints are too strict; relax the search scope
The LLM returns a structured decision:
{
"action": "RETRIEVE",
"rewrittenQuery": "Taichung climbing crag transportation options",
"reasoning": "The current documents lack transportation info; need to supplement."
}
Trigger Conditions
Agentic RAG isn’t on by default. It requires:
rag_strategy === 'agentic'orrag_strategy === 'auto'(in auto mode, the strategy is chosen based onqueryType)queryType === 'complex'
The reason is straightforward: Agentic RAG has significantly higher latency than standard RAG (multiple LLM calls + multiple searches), so it’s not appropriate for every query.
Standard RAG: 5–8 s
Agentic RAG: 10–20 s (depending on number of steps)
Are users willing to wait longer in exchange for a more complete answer? That depends on how complex the query is. auto mode lets the system make that call.
How It Differs from CRAG
CRAG is a rule-based fallback triggered by zero results; Agentic RAG is LLM-driven intervention when results exist but aren’t good enough:
| CRAG | Agentic RAG | |
|---|---|---|
| Trigger | Zero candidate documents | LLM judges context insufficient |
| Decision | Rule-based (remove filter) | LLM (rewrite query / broaden) |
| Complexity | Low | High |
| Added latency | ~+0.5 s (one extra search) | +5–15 s (multiple LLM calls) |
The two can run together: CRAG as a baseline safety net, Agentic RAG as the high-quality path.
Multi-Hop Reasoning in Practice
For queries that require synthesizing multiple sources, Agentic RAG clearly outperforms standard RAG:
Standard RAG: search “Taichung climbing trip” → retrieve a few crag writeups → LLM generates limited suggestions from that sparse context
Agentic RAG:
- Step 1: Search “crags near Taichung” → finds Dakeng, Guguan
- Step 2: LLM notices grade info is missing → searches “Dakeng crag route grades”
- Step 3: LLM notices logistics are missing → searches “Dakeng crag how to get there”
- Synthesizes all three passes → produces a thorough trip plan
Each step fills a specific gap in the context rather than throwing one broad search at the wall and hoping for the best.
The Takeaway
Agentic RAG represents the evolution of RAG systems from passive retrieval to active reasoning. It’s not suited for high-traffic, latency-sensitive scenarios — but for complex planning and multi-hop reasoning queries, the quality improvement is substantial.
The core design principle: give the LLM enough information instead of making it guess. Rather than asking the model to reason from an incomplete context, let it run a few more searches until it has what it needs. Agentic RAG hands that judgment back to the LLM.
References
Loading...