MMR + Popularity Weighting: Recommendations That Are Both Relevant and Diverse

TL;DR Ranking purely by relevance leaves you with five documents all describing the same route. MMR strikes a balance between relevance and diversity, and layering in popularity weighting makes results even more useful.

#rag #mmr #diversity #reranking #popularity #recommendation

Table of Contents

The MMR Algorithm
Similarity Calculation
What λ Controls
Popularity Weighting
The Full Selection Pipeline
The Bigger Picture
References

🌏 中文版

After Cross-Encoder reranking, the top 5 results might all describe the same route — just pulled from different sources, each mentioning some 5.11a at Longdong in slightly different words. Every one of those documents is relevant to the query, but feeding them all into an LLM is wasteful: redundant information doesn’t improve the answer, it just burns context window.

MMR (Maximal Marginal Relevance) addresses exactly this: balancing relevance and diversity so you don’t pack your context with near-duplicate documents.

The MMR Algorithm

MMR is a greedy selection algorithm. At each step, it picks one document from the candidate pool to add to the selected set, using this scoring rule:

MMR(d) = λ × relevance(d, query) - (1 - λ) × max_sim(d, already_selected)

relevance(d, query): how relevant the document is to the query (Cross-Encoder score)
max_sim(d, already_selected): the similarity between this document and its closest match among already-selected documents
λ: weight given to relevance (0 → pure diversity, 1 → pure relevance)

First document: just pick the highest-relevance candidate — there’s nothing in the selected set to compare against yet.

Second document: find the candidate with the highest MMR score among the remaining pool. A candidate that closely resembles the already-selected document will have a large max_sim term, pulling its MMR score down.

Repeat until top_k documents are selected or the candidate pool is exhausted.

Similarity Calculation

For inter-document similarity, embedding cosine similarity is overkill (too expensive). Instead, this uses a lightweight metadata-based similarity score:

function documentSimilarity(a: Document, b: Document): number {
  let score = 0;

  // Same crag → high similarity
  if (a.crag_id && a.crag_id === b.crag_id) score += 0.4;

  // Close grades (numeric difference < 5) → similar
  if (Math.abs(a.grade_numeric - b.grade_numeric) < 5) score += 0.3;

  // Same climbing type
  if (a.route_type === b.route_type) score += 0.2;

  // Same document type (route / crag / video)
  if (a.type === b.type) score += 0.1;

  return score;
}

This metadata-based approach is much cheaper to compute, and it maps well to climbing content: documents from the same crag are the most likely to overlap.

What λ Controls

λ shapes the “personality” of recommendations:

λ value	Effect	Best for
0.9	Almost pure relevance	User asks a specific, focused question
0.7	Relevance-first with some diversity	General recommendations (default)
0.5	Equal weight to both	Exploratory queries
0.3	Diversity-first	”Surprise me” scenarios

The system defaults to 0.7. It can be tuned dynamically via ai_config without a redeployment.

Popularity Weighting

After MMR selection, a popularity weighting pass re-sorts the results:

const finalScore = mmrScore + popularityWeight * popularityScore;

popularityScore is derived from a route or crag’s click-through rate, ratings, and review count. The logic is straightforward: when two documents have similar MMR scores, surface the one with stronger community validation — the recommended route is more likely to actually satisfy the user.

This also patches a blind spot in embeddings: a new route (well-documented, highly rated) and a beloved classic might sit at nearly the same embedding distance from a query, but users are more likely to trust something with a track record.

The Full Selection Pipeline

Candidates from Cross-Encoder ranking
        ↓
[MMR greedy selection] ← λ=0.7
        ↓
MMR-selected results (Top-K documents)
        ↓
[Popularity weighting sort]
        ↓
Final document set → LLM context generation

The Bigger Picture

MMR tends to be underappreciated in RAG systems. Sending the top 10 most-relevant documents straight into an LLM versus sending 10 MMR-selected documents that cover diverse angles — the former often floods the LLM with repetitive information, while the latter gives it a broader view of the problem, leading to more complete answers.

Add popularity weighting on top, and you get recommendations that combine semantic relevance (vector search + reranking) with community validation (popularity). It’s a design that finds a practical middle ground between algorithmic rigor and user experience.