Skip to content
All tags

#embedding

11 posts
ai deep-dive

Semantic Similarity ≠ Retrieval Relevance: Scenarios, Detection, and Remedies for Systematic Embedding Retrieval Failures

Cosine similarity and relevance systematically diverge across an entire class of scenarios: negation (most IR models score at or below random on NevIR), exact identifiers, numeric thresholds, and logical combinations (SoTA models achieve recall@100 < 20 on LIMIT) -- some of these hit the theoretical ceiling of the single-vector paradigm, and switching to a larger model will not help. Recommended remedy order: hybrid BM25 -> reranker (Anthropic measured -67%) -> upstream metadata routing -> domain fine-tuning / multi-vector.

ai deep-dive

A More Expensive Embedding Won't Save Your Traditional Chinese RAG: Three Layers of Failure and the Fix Order

Traditional Chinese RAG retrieval failures are a three-layer stack: embedding granularity defects (BGE/GTE from 0.1B to 7B all mis-rank on simple queries like 'fried chicken'), Simplified Chinese / English corpus dominance causing local vocabulary drift ('premium', 'exclusion clause' alignment is unreliable), and MTEB Chinese benchmarks being Simplified Chinese making model selection signals misleading. The fix is architectural: OpenCC normalization -> hybrid + jieba segmentation -> reranker -> local fine-tuning last -- and the prerequisite for all of it is building a Traditional Chinese eval set first.

ai guide

knowledge-pipeline: A Six-Layer Pipeline for RAG Quality Control

A six-layer deterministic pipeline that handles everything from URL ingestion to vector embedding automatically, filtering out garbage before it enters your RAG system through an eight-dimension scoring system.

ai project

2026 Q1 Open-Source LLM Landscape: From Frontier Models to On-Device, a Complete Survey

2026 Q1 saw a full-blown open-source model explosion: on the LLM front, GLM-5, Kimi K2.5, and Qwen3.5 caught up with closed-source models; Embedding and Reranker are dominated by Qwen3 and BGE; speech has Voxtral TTS and Whisper V3; image has FLUX.2; and video has Wan 2.2 rivaling Sora. This is the complete navigation map.

tech deep-dive

When Vector Search Matches by Name Instead of Grade: Attribute Conflation in RAG Systems

Query: 'I just sent Beauty in the Mirror 5.11b — recommend routes of similar difficulty.' The results came back full of routes with similar-sounding names, not similar grades. Root cause: dense embeddings compress multiple attributes into a single vector, and the rarity of the route name drowns out the grade signal. The fix: three layers of defense — metadata pre-filtering, query rewriting, and score fusion.

ai guide RAG 系統實戰

The Complete Guide to RAG System Patterns: A Ten-Generation Evolution from Naive to Multi-Agent with Practical Navigation

RAG has evolved far beyond simple 'search + generate' into a technology ecosystem spanning ten generations. This article is a systematic navigation guide: from Naive RAG to Multi-Agent RAG across ten generations, covering retrieval strategies, chunking, embedding, reranking, evaluation frameworks, observability, and cost optimization. Each topic has a dedicated deep-dive article.

ai guide

BGE-M3: Why This Embedding Model Works Well for Traditional Chinese RAG

Your choice of embedding model directly determines RAG search quality. BGE-M3's multilingual training, 1024-dimensional vectors, and matching Reranker make it a practical pick for Traditional Chinese RAG.

ai guide

Contextual Retrieval: Giving Every Chunk Its "What This Is About" Context

When you split a document into chunks, each chunk loses its place in the original document. Contextual Retrieval solves the isolated-chunk problem by injecting a document-level summary into every chunk at index time.

ai guide RAG 系統實戰

Hybrid Search: Using BM25 + Vector Search to Cover Each Other's Blind Spots

Vector search handles semantics; BM25 handles keywords. Combining them with RRF is what lets you handle both fuzzy queries and exact terms at the same time.

ai guide

HyDE: Boosting Vector Search Recall with Hypothetical Answers

Have an LLM generate an 'ideal answer' first, then embed that hypothetical document for search — it outperforms searching with the raw query.

tech deep-dive

NobodyClimb AI Architecture: Building a 20-Node RAG Pipeline on Cloudflare Workers

A dynamically composable RAG pipeline built on Cloudflare Workers AI (gemma-3-12b-it + bge-m3): 14 base steps + 6 LangGraph-specific nodes, with three strategy graphs (Baseline / Agentic / Plan-Execute) selected at runtime.