Skip to content

Multi-Query Expansion: Search One Question from Multiple Angles

Mar 12, 2026 1 min
TL;DR A single vector search on a complex query often misses relevant documents. Let the LLM rewrite the query into 3-5 sub-queries, run them in parallel, and recall improves significantly.

🌏 中文版

Vector search results are constrained by how a query is phrased. The same underlying need expressed with different wording can produce embeddings that are far apart in vector space, causing relevant documents to be missed.

Take the query “beginner-friendly routes at Longdong.” Vector search only explores from that exact angle — but the relevant documents in your database might use phrases like “Longdong intro routes,” “Longdong 5.8–5.9,” “Longdong newbie-friendly,” or “Longdong dense protection.” A single query vector only reaches a few of these directions; the rest get left behind.

Multi-Query Expansion takes a straightforward approach: use an LLM to rewrite the original query into several sub-queries from different angles, search each one independently, then merge the results.

Rewriting Strategy

Original query: "Beginner-friendly route recommendations at Longdong"

Rewritten as:
  1. "Longdong sport climbing routes graded 5.8 to 5.9"          ← quantified difficulty
  2. "Longdong routes with dense protection, newbie-friendly"     ← safety characteristics
  3. "Longdong entry-level routes for first-time outdoor climbing" ← contextual description
  4. "龍洞 beginner-friendly sport climbing routes"               ← bilingual variant

Each sub-query starts from a different semantic angle, covering the various ways relevant documents might be described in the database.

Prompt Design

You are a climbing knowledge assistant. Given the following query, generate 3-5 related sub-queries from different angles.
The sub-queries should use varied vocabulary and phrasing to improve search recall.
Output one sub-query per line. Output only the sub-queries, no additional explanation.

Original query: {query}

Output parsing: split by line, filter empty lines, cap at 5 sub-queries (to avoid over-expansion and excessive latency).

Execution Flow

Original Query

[LLM: Multi-Query Generator]

[Q1, Q2, Q3, Q4] ← list of sub-queries

[Embedding Q1] [Embedding Q2] [Embedding Q3] [Embedding Q4]  ← parallel
     ↓              ↓              ↓              ↓
[Search Q1]    [Search Q2]    [Search Q3]    [Search Q4]    ← parallel
     ↓              ↓              ↓              ↓
                  [RRF Fusion]

               Merged Candidates

Embedding and search both run in parallel. The LLM rewriting step is sequential (it has to complete before the rest can proceed), but it executes within the overall parallel pipeline framework — running concurrently with the HyDE LLM call.

When It Triggers

Like HyDE, this only activates when queryType === 'complex'. Simple queries have clear semantics and don’t need extra expansion; SQL queries take a different path entirely.

Characteristics of complex queries:

  • Multiple conditions combined (difficulty + location + type)
  • Semantically vague or ambiguous (“fun routes”)
  • Requires comparison or recommendation (“best for xxx”)

Comparison with HyDE

HyDEMulti-Query
Rewriting strategyGenerate a hypothetical answer documentGenerate multiple query angles
Semantic coverageBridge query → answer language patternsMulti-dimensional expression of the same need
Best suited forLarge gap between query and document language styleNeeds that can be described from multiple dimensions
Fusion methodRRF with original query resultsRRF across multiple sub-query results

In practice, both run simultaneously. At the RRF stage, each is treated as an independent search path:

RRF inputs = [
  queryVector results,      // original query
  hydeVector results,       // HyDE hypothetical document
  subQuery1 results,        // Multi-query sub-query 1
  subQuery2 results,        // Multi-query sub-query 2
  subQuery3 results,        // Multi-query sub-query 3
  bm25 results,             // keyword search
]

Six paths merge together, accumulating RRF scores per document. A document that ranks highly across more paths earns a higher fused score — which is exactly what we want.

Cost Considerations

The main costs of Multi-Query Expansion are:

  1. LLM rewriting: one extra LLM call (a lightweight model works fine — rewriting doesn’t need a large model)
  2. Multiple embeddings: N sub-queries each require one embedding call
  3. Multiple searches: N parallel vector searches

In the context of a climbing community, complex queries are typically the ones where users need the highest quality results. The cost is worth it. Skipping this for simple queries also avoids unnecessary overhead.

The Big Picture

Multi-Query Expansion is essentially using the LLM’s language capabilities to compensate for the blind spots in vector search coverage. Single-query recall is bottlenecked by how the user happens to phrase their question; multi-angle rewriting breaks that constraint. Combined with RRF fusion, documents that get hits from multiple angles rank higher — and the overall result quality improves.


References