#embedding

11 posts

ai deep-dive Jun 4, 2026

Semantic Similarity ≠ Retrieval Relevance: Scenarios, Detection, and Remedies for Systematic Embedding Retrieval Failures

Cosine similarity and relevance systematically diverge across an entire class of scenarios: negation (most IR models score at or below random on NevIR), exact identifiers, numeric thresholds, and logical combinations (SoTA models achieve recall@100 < 20 on LIMIT) -- some of these hit the theoretical ceiling of the single-vector paradigm, and switching to a larger model will not help. Recommended remedy order: hybrid BM25 -> reranker (Anthropic measured -67%) -> upstream metadata routing -> domain fine-tuning / multi-vector.

#retrieval #embedding #rag #vector-search #llm

ai deep-dive Jun 4, 2026

A More Expensive Embedding Won't Save Your Traditional Chinese RAG: Three Layers of Failure and the Fix Order

Traditional Chinese RAG retrieval failures are a three-layer stack: embedding granularity defects (BGE/GTE from 0.1B to 7B all mis-rank on simple queries like 'fried chicken'), Simplified Chinese / English corpus dominance causing local vocabulary drift ('premium', 'exclusion clause' alignment is unreliable), and MTEB Chinese benchmarks being Simplified Chinese making model selection signals misleading. The fix is architectural: OpenCC normalization -> hybrid + jieba segmentation -> reranker -> local fine-tuning last -- and the prerequisite for all of it is building a Traditional Chinese eval set first.

#rag #embedding #traditional-chinese #retrieval #llm

ai guide Apr 18, 2026

knowledge-pipeline: A Six-Layer Pipeline for RAG Quality Control

A six-layer deterministic pipeline that handles everything from URL ingestion to vector embedding automatically, filtering out garbage before it enters your RAG system through an eight-dimension scoring system.

#rag #knowledge-management #pipeline #embedding #bge-m3 #sqlite #quality-control

ai project Mar 31, 2026

2026 Q1 Open-Source LLM Landscape: From Frontier Models to On-Device, a Complete Survey

2026 Q1 saw a full-blown open-source model explosion: on the LLM front, GLM-5, Kimi K2.5, and Qwen3.5 caught up with closed-source models; Embedding and Reranker are dominated by Qwen3 and BGE; speech has Voxtral TTS and Whisper V3; image has FLUX.2; and video has Wan 2.2 rivaling Sora. This is the complete navigation map.

#open-source #llm #glm-5 #kimi #deepseek #qwen #llama #gemma #mistral #minimax #phi #smollm #gpt-oss #moe #on-device-ai #embedding #reranker #tts #stt #image-generation #video-generation #code-model #ollama #vllm

tech deep-dive Mar 28, 2026

When Vector Search Matches by Name Instead of Grade: Attribute Conflation in RAG Systems

Query: 'I just sent Beauty in the Mirror 5.11b — recommend routes of similar difficulty.' The results came back full of routes with similar-sounding names, not similar grades. Root cause: dense embeddings compress multiple attributes into a single vector, and the rarity of the route name drowns out the grade signal. The fix: three layers of defense — metadata pre-filtering, query rewriting, and score fusion.

#rag #vector-search #embedding #cloudflare-workers #recommendation-system

ai guide RAG 系統實戰 Mar 14, 2026

The Complete Guide to RAG System Patterns: A Ten-Generation Evolution from Naive to Multi-Agent with Practical Navigation

RAG has evolved far beyond simple 'search + generate' into a technology ecosystem spanning ten generations. This article is a systematic navigation guide: from Naive RAG to Multi-Agent RAG across ten generations, covering retrieval strategies, chunking, embedding, reranking, evaluation frameworks, observability, and cost optimization. Each topic has a dedicated deep-dive article.

#rag #guide #retrieval #embedding #reranking #evaluation #agent

ai guide Mar 12, 2026

BGE-M3: Why This Embedding Model Works Well for Traditional Chinese RAG

Your choice of embedding model directly determines RAG search quality. BGE-M3's multilingual training, 1024-dimensional vectors, and matching Reranker make it a practical pick for Traditional Chinese RAG.

#rag #embedding #bge-m3 #multilingual #vector-search #cloudflare-workers-ai

ai guide Mar 12, 2026

Contextual Retrieval: Giving Every Chunk Its "What This Is About" Context

When you split a document into chunks, each chunk loses its place in the original document. Contextual Retrieval solves the isolated-chunk problem by injecting a document-level summary into every chunk at index time.

#rag #contextual-retrieval #chunking #indexing #embedding

ai guide RAG 系統實戰 Mar 12, 2026

Hybrid Search: Using BM25 + Vector Search to Cover Each Other's Blind Spots

Vector search handles semantics; BM25 handles keywords. Combining them with RRF is what lets you handle both fuzzy queries and exact terms at the same time.

#rag #hybrid-search #bm25 #vector-search #rrf #embedding

ai guide Mar 12, 2026

HyDE: Boosting Vector Search Recall with Hypothetical Answers

Have an LLM generate an 'ideal answer' first, then embed that hypothetical document for search — it outperforms searching with the raw query.

#rag #hyde #embedding #vector-search #query-enhancement

tech deep-dive Mar 12, 2026

NobodyClimb AI Architecture: Building a 20-Node RAG Pipeline on Cloudflare Workers

A dynamically composable RAG pipeline built on Cloudflare Workers AI (gemma-3-12b-it + bge-m3): 14 base steps + 6 LangGraph-specific nodes, with three strategy graphs (Baseline / Agentic / Plan-Execute) selected at runtime.

#rag #cloudflare-workers-ai #llm #pipeline #gemma #embedding #hono #langgraph