ai

196 posts

ai deep-dive Jun 20, 2026

Loop Engineering: When AI No Longer Needs You to Write Prompts

Loop Engineering is the practice of designing systems that automatically prompt AI agents, rather than prompting them manually. Boris Cherny runs hundreds of agents, Addy Osmani coined the term, and Blake Crosley identified verification cost as the real bottleneck — this article covers primary sources, the five building blocks, applicability boundaries, and criticisms.

#loop-engineering #ai-agent #claude-code #prompt-engineering #harness-engineering #agentic-coding

ai deep-dive Jun 9, 2026

Text / Image to Lottie: A Landscape Overview of AI Animation Generation Tools

From the CLI tool kin3o to the CVPR 2026 paper OmniLottie — a survey of open-source approaches for converting text and images into Lottie animations, with performance benchmarks and selection guidance.

#lottie #animation #open-source #llm #vlm #vector-animation

ai deep-dive Jun 6, 2026

The Skill Management Revolution for LLM Agents: A Complete Landscape of Skill Lifecycle from Voyager to MUSE-Autoskill

MUSE-Autoskill (2026) introduces a five-stage skill lifecycle framework. Self-created skills achieve 60.35% (+7.16%) on SkillsBench overall, and an impressive 87.94% on tasks where skill generation succeeds — surpassing the human-authored skill ceiling. This post synthesizes six arXiv papers to map the full landscape of skill evolution research.

#agent-skills #ai-agent #llm #self-refinement #memory #arxiv #paper-review

ai deep-dive Jun 4, 2026

How to Rigorously Compare Before and After Agent Changes: From Golden Sets to Statistical Testing

Even with temperature=0, LLM outputs can still fluctuate by up to 15% in practice. To rigorously compare agent changes, you need a frozen golden set, at least 3 runs per query averaged out, LLM-as-judge blind evaluation (pairwise preference flip rate reaches 35%), and paired statistical tests -- not just running each version once and going by feel.

#evaluation #rag #llm-judge #ab-testing #ai-agent #llm

ai deep-dive Jun 4, 2026

Agent Observability: From OTel Traces to Catching Hallucinations, Tool Misuse, and Infinite Loops

The industry has converged on using OpenTelemetry GenAI semantic conventions to turn every LLM call and tool call into a span. Detecting the three major failure modes then splits into three tracks: faithfulness + semantic entropy for hallucinations, framework-level symbolic guardrails for tool misuse, and max steps + action hash deduplication for infinite loops — all wired into a Final / Trajectory / Single-step three-layer evaluation framework.

#observability #ai-agent #tool-use #llm #opentelemetry

ai deep-dive Jun 4, 2026

Resource Rationality for Agents: Optimal Decisions Across Tokens, Tool Calls, and Latency

Agent decision-making under resource constraints is bounded rationality reborn: Rational Metareasoning uses VOC rewards to save 20-37% of tokens, BATS proves that adding budget without budget awareness is futile, FrugalGPT cascades cut costs by up to 98%, and Speculative Actions reduce latency by 20%. The three constraints ultimately converge into a single Pareto curve, and the overarching trend is moving from humans tuning knobs to models making resource-rational decisions on their own.

#ai-agent #reasoning #test-time-compute #llm #cost-optimization

ai deep-dive Jun 4, 2026

The Single Crack in Agent Security: From Prompt Injection to Trust Boundaries to Multi-Agent Worms

Three seemingly distinct agent security problems — tool output injection, trust boundaries, malicious agents — share the same root cause: LLMs flatten instructions and data into a single token stream, making them architecturally unable to distinguish between the two. Understand this through-line and you can trace every attack from EchoLeak (CVE-2025-32711, zero-click) to the Morris II AI worm, and see why 'making the model behave' doesn't work — only architectural constraints (six design patterns, CaMeL) do.

#security #ai-agent #prompt-injection #multi-agent #llm

ai deep-dive Jun 4, 2026

How Agents Decide Whether to Retrieve, What to Retrieve, and How to Merge: Three Decision Layers of Agentic RAG

Traditional RAG is a fixed pipeline of 'retrieve then answer.' Agentic RAG splits retrieval into three decision layers: when to retrieve (FLARE uses token probabilities; Adaptive-RAG uses a complexity classifier), what to retrieve (HyDE / RAG-Fusion / decomposition / Step-back), and how to fuse (RRF k=60 then cross-encoder rerank then compression -- Anthropic measured a -67% failure rate reduction). Key counter-intuitive insight: unnecessary retrieval hurts quality -- 'deciding not to retrieve' is a first-class capability.

#rag #agentic-rag #retrieval #ai-agent #llm

ai deep-dive Jun 4, 2026

Stop Hand-Tuning Prompts: From GEPA to Tool Descriptions, Automating Agent Behavior Optimization

Automatic prompt optimization (APO) has evolved from APE/OPRO to GEPA: replacing sparse rewards with linguistic reflection, winning over GRPO by ~6pp with 4-35x fewer rollouts. Meanwhile, tool descriptions are the overlooked prompt -- small wording changes can shift tool selection rates by 10x, and Anthropic's experiments show Claude self-rewriting tool descriptions outperforms human experts. These two lines are converging: eval-driven automatic optimization is eating hand-tuned prompts.

#prompt-engineering #tool-use #ai-agent #llm #optimization

ai deep-dive Jun 4, 2026

How to Build a Deep Research Agent: Multi-Turn Search Planning, Conflict Resolution, and Verifiable Conclusions

An autonomous research agent = four controllable stages: planning (decompose into sub-questions), retrieval loop (search -> read -> reflect on gaps -> search again), evidence arbitration (>=2 independent sources, typed conflict handling), and verifiable output (sentence-level citations + independent verification pass). Two approaches: training-based uses RL to learn end-to-end when to search (Search-R1 +41%); orchestration-based uses orchestrator-worker division of labor (Anthropic internal eval +90.2%, at ~15x token cost).

#deep-research #ai-agent #multi-agent #retrieval #llm

ai deep-dive Jun 4, 2026

Machine Theory of Mind: How Agents Infer Other Agents' Intentions, Knowledge, and Goals

Inferring another's beliefs/goals/intentions from observed behavior is called Machine Theory of Mind. Three lineages: symbolic BDI, Bayesian inverse planning, and deep learning ToMnet. The biggest controversy in the LLM era is that GPT-4 still trails humans by >10 points on ToMBench — are high scores genuine reasoning or statistical shortcuts?

#theory-of-mind #multi-agent #ai-agent #llm #reasoning

ai deep-dive Jun 4, 2026

Multi-Agent Error Propagation and Recovery: Borrowing Thirty Years of Weapons from Distributed Systems

At 99% accuracy per step over 100 steps, the error-free completion rate drops to just 36% -- error compounding is a structural problem, not something prompt tuning can fix. Distributed systems' supervisor trees, bulkheads, circuit breakers, sagas, and durable execution can be mapped almost one-to-one into agent orchestration. But LLMs introduce a failure class that traditional systems never had -- semantic errors that don't crash -- which require Inspector agents (recovering 96.4%) and redundancy voting (MAKER: one million steps with zero errors) to address.

#multi-agent #ai-agent #fault-tolerance #orchestration #llm

ai deep-dive Jun 4, 2026

Semantic Similarity ≠ Retrieval Relevance: Scenarios, Detection, and Remedies for Systematic Embedding Retrieval Failures

Cosine similarity and relevance systematically diverge across an entire class of scenarios: negation (most IR models score at or below random on NevIR), exact identifiers, numeric thresholds, and logical combinations (SoTA models achieve recall@100 < 20 on LIMIT) -- some of these hit the theoretical ceiling of the single-vector paradigm, and switching to a larger model will not help. Recommended remedy order: hybrid BM25 -> reranker (Anthropic measured -67%) -> upstream metadata routing -> domain fine-tuning / multi-vector.

#retrieval #embedding #rag #vector-search #llm

ai deep-dive Jun 4, 2026

How to Pick the Right Tool from Hundreds: The Collapse Curve of Tool Selection and Engineering Solutions

As tools scale up, selection accuracy doesn't degrade gracefully — it collapses: 4 to 51 tools drops from 43% to 2%, 10 to 100+ drops from 78% to 13.62%. The root fix is to stop stuffing everything in at once — Anthropic's Tool Search Tool uses defer loading plus retrieval to cut 85% of tokens, pushing Opus 4.5 accuracy from 79.5% to 88.1%. Description quality has conditional payoff: negligible in simple scenarios, but correctness jumps from 44% to 50% in multi-tool chaining.

#tool-use #ai-agent #mcp #llm #context-engineering

ai deep-dive Jun 4, 2026

A More Expensive Embedding Won't Save Your Traditional Chinese RAG: Three Layers of Failure and the Fix Order

Traditional Chinese RAG retrieval failures are a three-layer stack: embedding granularity defects (BGE/GTE from 0.1B to 7B all mis-rank on simple queries like 'fried chicken'), Simplified Chinese / English corpus dominance causing local vocabulary drift ('premium', 'exclusion clause' alignment is unreliable), and MTEB Chinese benchmarks being Simplified Chinese making model selection signals misleading. The fix is architectural: OpenCC normalization -> hybrid + jieba segmentation -> reranker -> local fine-tuning last -- and the prerequisite for all of it is building a Traditional Chinese eval set first.

#rag #embedding #traditional-chinese #retrieval #llm

ai guide May 28, 2026

arXiv Paper Quality Assessment Guide: From Endorsement Mechanisms to a Practical Checklist

arXiv does not perform peer review, and roughly 2% of submissions are rejected. Quality judgment relies on external signals: top venue acceptance > institution + open-source reproduction > citation quality. Includes a 20-item practical checklist and a 2026 toolbox (PWC has shut down).

#arxiv #paper-reading #research-tools #reproducibility #llm

ai deep-dive May 24, 2026

Auto-Embedding on File Upload Is a Bad Default: A Survey of Adaptive / Agentic RAG and Agentic Parsing

Making 'chunk and embed every uploaded file automatically' the default behavior means making a decision for the LLM that it could have made itself. From Self-RAG (2310.11511) and Adaptive-RAG (2403.14403) to AgenticOCR (2602.24134), the academic trajectory is pushing three layers of decision-making -- whether to retrieve, whether to parse, and how to chunk -- from the ingestion pipeline back to the agent at conversation time.

#rag #agentic-rag #adaptive-rag #tool-use #llm-agent #agentic-parsing #document-parsing

ai deep-dive May 24, 2026

Assembling LLM Agent Skills / Tools / Code Interpreter for Real: A Paper Reading Map

The hard part of LLM agents is not building function calling, skills, code interpreter, and document tools individually -- it is assembling them into a system that selects the right tool, writes code when needed, decomposes tasks, verifies results, and resists prompt injection. This post organizes the key papers into six engineering decisions: function calling reliability, tool/skill selection, code-as-action, multi-step planning, skill systems, and safety plus document generation.

#llm #agents #tool-use #skills #code-interpreter #function-calling #paper-review

ai deep-dive May 23, 2026

A2UI (Agent-to-User Interface): Google's Open Protocol for Agents to Ship UI as Data

A2UI is an agent generative UI protocol open-sourced by Google on 2025-12-15: agents send declarative JSON describing UI intent, and clients render it natively using their own component catalog whitelist, layered on top of A2A. It launched at format v0.8 and iterated to v0.9 within three months.

#a2ui #google #generative-ui #agent-protocol #mcp #ag-ui

ai deep-dive May 23, 2026

browse.sh: Turning What Browser Agents Learn into a Skill Catalog

browse.sh, launched by Browserbase in May 2026, is two things: a browser skill catalog and the Browse CLI. The core thesis: the bottleneck for browser agents isn't reasoning — it's amnesia. By storing learned site-specific workflows as plain-text SKILL.md files, Autobrowse cut Craigslist task costs from ~$0.22 to ~$0.12 by their own metrics. Note: this has nothing to do with the 2018 Browsh text-mode browser.

#browse-sh #browser-agent #agent-skills #browserbase #autobrowse

ai deep-dive May 23, 2026

CodeGraph: Local Code Knowledge Graph, and the Truth About 'Walking the Graph to Save Money'

CodeGraph uses tree-sitter to extract a codebase into a local SQLite/FTS5 knowledge graph, letting AI coding agents query the graph instead of scanning files. The official end-to-end benchmark (7 repos, median of 4 runs) averages 35% cost savings and 70% fewer tool calls -- but only if the agent actually walks the graph. Delegating exploration to a file-reading subagent that ignores CodeGraph turns it into pure overhead.

#codegraph #mcp #knowledge-graph #tree-sitter #context-engineering #claude-code

ai deep-dive May 23, 2026

How Do People Read arXiv Papers? A Complete Guide to Methods and Tools

Reading papers is two problems stacked together: methodology (Keshav's three-pass method, 5-10 min / 1 hour / 4-5 hours) determines how to read, and tools (arXiv HTML, alphaXiv, NotebookLM, Connected Papers, Zotero) shorten the time for each pass. AI lowers the barrier to understanding; judging correctness always stays with the human.

#arxiv #paper-reading #research-tools #llm #literature-review #notebooklm #zotero

ai deep-dive May 23, 2026

Midscene.js: Betting on Pure Vision for Cross-Platform UI Automation

An MIT-licensed open-source UI automation framework from ByteDance (~13k GitHub stars). UI actions rely solely on feeding screenshots to vision-language models (Qwen3-VL / Doubao / Gemini-3 / UI-TARS), with no DOM parsing. A single JS API works across Web / Android / iOS / desktop, and starting from v1.0, the DOM action mode was removed entirely. The trade-off: each step is slower and more token-expensive.

#midscene #ui-automation #vision-language-model #mcp #agent #bytedance

ai deep-dive May 19, 2026

How Claude Reads and Writes PDF / DOCX / PPTX: Deconstructing the Three-Layer Architecture of Skills + Sandbox

Claude has no docx_tool or pdf_tool -- it relies on bash + file tools, plus SKILL.md instructions and pre-installed libraries like pdfplumber / python-pptx inside the container, assembling file handling capabilities from three layers.

#claude #agent-skills #anthropic #code-interpreter #sandbox #document-skills

ai deep-dive May 19, 2026

Open Design: The Open-Source Claude Design Alternative Forked in 11 Days

Anthropic shipped Claude Design on 2026-04-17. On 4-28, nexu-io/open-design went public -- same artifact-first loop, Apache-2.0, runs on the 16 coding-agent CLIs you already have. Two weeks from 0.1 to 0.7, 40k+ stars. A paradigm shift that flattens AI design tools from vertical SaaS into a skill bundle.

#open-design #claude-design #anthropic #agent-cli #claude-code #mcp #open-source

ai deep-dive May 19, 2026

system_prompts_leaks Deep Dive: What Problem Does a 40k-Star AI System Prompt Archive Solve

asgeirtj/system_prompts_leaks collects the raw system prompts of 40+ AI assistants, from GPT-5.5 and Claude Opus 4.7 to Gemini 3.1 Pro, with 40.3k stars, 461 commits, and an MIT license. The value isn't in obtaining secrets -- it's in turning vendors' implicit policies into comparable engineering material. What you should study is the design decisions, not the text itself.

#system-prompts #prompt-engineering #ai-transparency #claude #chatgpt #anthropic #open-source

ai deep-dive May 18, 2026

Dissecting Anthropic's Founder's Playbook: Four Stages, Three Moats, and One Cowork Compliance Pitfall

Anthropic's 35-page startup handbook released 2026-05-14 reorganizes Idea/MVP/Launch/Scale around agentic AI. The most valuable takeaways are 'the easier it is to build, the more important validation becomes' and treating CLAUDE.md as the first MVP artifact. The part to discount: the Launch chapter puts compliance workstreams on Cowork -- but Anthropic's own docs say Cowork doesn't write audit logs.

#anthropic #claude #startup #playbook #claude-cowork #compliance

ai May 10, 2026

Using AI Agents to Operate Video Generation Tools: A HyperFrames, HeyGen, and Runway Integration Guide

AI agents can operate video generation tools through three approaches — Skills, MCP Connectors, and direct APIs. Choosing the right integration method matters more than choosing the right tool.

#ai-agent #video-generation #hyperframes #heygen #mcp #claude-code #cursor

ai deep-dive May 10, 2026

Code Mode: Moving Tool Definitions from Context into Code

Stop stuffing all your tool descriptions into context at session start. Let the model write code, have the runtime execute it, and let tool definitions enter context only at the import line — Anthropic's GDrive→Salesforce example dropped from ~150K tokens to 2K, and Cloudflare's 2,500-endpoint schema shrank from 1.17M to 1K.

#mcp #agent #code-mode #runtime #context-engineering #anthropic #cloudflare

ai deep-dive May 10, 2026

The FDE War: Why OpenAI and Anthropic Are Both Copying Palantir's Playbook

MIT research says 95% of enterprise AI pilots yield zero return. OpenAI and Anthropic announced multi-billion-dollar joint ventures in the same week, wholesale adopting the Forward Deployed Engineer model that Palantir has used for over a decade to bring AI into the enterprise battlefield.

#fde #forward-deployed-engineer #openai #anthropic #palantir #enterprise-ai #deployment

ai deep-dive May 10, 2026

How Others Use LLMs to Write: Trade-off Notes from Karpathy's LLM-wiki to Multi-Agent Pipelines

A survey of 11 public LLM writing pipelines, distilled into three dominant patterns: multi-agent (researcher -> writer -> critic), Karpathy LLM-wiki (raw + wiki + LLM writes, humans don't), and quality guardrails (technical verifier + never fabricate + brief gate). The Princeton GEO paper (KDD 2024) quantifies the impact: inline citations +28%, adding statistics +33%, quoting source text +41%, keyword stuffing -9%.

#llm-writing #content-pipeline #claude-code #agent-skills #llm-wiki #geo #multi-agent #harness-engineering

ai May 10, 2026

OpenAI's Codex Secure Deployment Strategy: Sandboxing, Auto-review, and Enterprise Governance

In May 2026, OpenAI published its internal Codex deployment practices: sandboxes define technical boundaries, approval policies determine when to pause, Auto-review delegates approval decisions to a sub-agent instead of a human, and Managed configuration lets enterprise admins enforce policies top-down. The core philosophy: zero friction for low-risk actions, mandatory review for high-risk ones.

#openai #codex #ai-agent #security #sandbox #enterprise

ai May 9, 2026

9Router: A Local 3-Tier Fallback Router That Routes Claude Code / Cursor / Cline to 40+ Providers

Spin up a local OpenAI-compatible endpoint at localhost:20128 that automatically routes requests from Claude Code / Cursor / Cline / Codex / Copilot through a Subscription → Cheap → Free 3-tier fallback to 40+ providers. Built-in RTK compresses tool_result (saving 20–40% input tokens), Caveman mode compresses output, OAuth auto-refresh, multi-account round-robin — install with npm install -g 9router and two commands.

#ai-router #9router #claude-code #cursor #cline #codex #llm-routing #token-saving #oauth #fallback

ai May 9, 2026

Claude, Codex, and Gemini Are All in the Browser Now: Comparing Three AI Agent Approaches in Chrome

Anthropic builds an extension, OpenAI builds its own browser, Google welds AI directly into Chrome — three completely different approaches. Here's a comparison of the current landscape, key differences, and a selection guide.

#ai-agent #chrome-extension #claude #codex #chatgpt-atlas #gemini #browser-agent

ai May 9, 2026

15 Walls for Building Your Own Auto-Dev Agent: Concrete Lessons from Stripe Minions

Stripe Minions says 'The walls matter more than the model,' but the case studies from four Silicon Valley companies never explained how to actually build those walls. This post breaks down the 15 walls we implemented in the daodao auto-dev agent: what each wall prevents, where the files live, and what the tradeoffs are. Tier 1 is mandatory, Tier 2 strengthens governance, Tier 3 is serious governance.

#ai-agent #claude-code #guardrails #allowlist #verification-loop #token-budget #test-first #defense-in-depth #pre-commit #sub-agent-council

ai May 9, 2026

What Is an Auto-Dev Agent? An Intro to daodao's Automated Development System

A PM checks a task card in Notion → the system syncs it to a GitHub issue → writes a plan → writes code → opens a PR for human review. This post explains what the system does, what it doesn't do, and why it's feasible now — written for people who don't write code.

#ai-agent #auto-dev-agent #product #automation-overview #non-engineer #notion #github #pipeline

ai May 9, 2026

Step-by-Step: Build a Notion → PR Auto-Dev Agent — A Reproducible Version of the daodao Pipeline

Build a Notion task → GitHub issue → spec PR → code PR auto-dev agent from scratch. Using the daodao case as a template, this guide walks through every step — what to do, what to verify, and how to handle problems. Notion DB schema → bin/ scaffold → two Claude Code routines → cloud env vars → staging tests.

#ai-agent #claude-code #tutorial #notion-sync #openspec #pipeline-automation #auto-dev-agent #routine #cloud-environment #github-automation

ai May 9, 2026

Claude for Financial Services: Dissecting Anthropic's Multi-Agent Reference Implementation

Anthropic open-sourced 12 financial-industry Agents and 11 MCP connectors. The real takeaway isn't the Agents themselves but the layered design of 'one prompt, two runtimes' and 'pure-file extensibility.'

#claude #agents #mcp #rag #langgraph #multi-agent

ai May 9, 2026

From Plan to PR: Building daodao's Auto-Dev Agent in Practice

5 rounds of consensus to write the plan, then team mode with 5 workers running 12 tasks in parallel — with plenty of pitfalls along the way. Writing it down for my future self and anyone else trying the same thing.

#ai-agent #claude-code #multi-agent #consensus-planning #auto-dev-agent #notion-sync #openspec #pipeline-automation #internal-coding-agent #defense-in-depth

ai deep-dive May 9, 2026

DeepSeek-OCR: The 10x Compression Experiment That Turns Long Context into Images

DeepSeek-OCR's paper is titled Contexts Optical Compression -- OCR is just the means; what it actually validates is that 'rendering text as images and feeding them to a VLM' achieves 10x compression at 97% accuracy. This is a qualitative shift for long-context LLM and RAG token costs.

#ocr #deepseek #vision-language-model #long-context #context-compression #rag

ai May 9, 2026

2026 LLM Inference Provider Free Tiers & Pricing: 40+ Services Ranked by Tier

For side projects, toy demos, and RAG prototypes, nobody wants to swipe a credit card on day one. This is a verified roundup of 40+ LLM inference providers still operating as of 2026/05, tiered by whether free resources auto-replenish or are one-time grants. Each entry notes credit-card requirements, supported models, paid starting prices, and catches. Chinese-origin providers including Zhipu GLM (permanently free), Doubao (2M tokens/day), Kimi, DashScope, and the Ollama local option are all included.

#llm #inference #pricing #free-tier #cerebras #groq #cloudflare-workers-ai #gemini #openrouter #deepseek #nvidia-nim #modal #ollama #mistral

ai deep-dive May 8, 2026

Claude Skills: Package Domain Knowledge into a Folder, Teach Once and It Remembers

A Skill is a folder with a SKILL.md. Three-layer progressive disclosure lets Claude load details only when needed, eliminating the need to re-explain preferences every conversation.

#claude #anthropic #claude-skills #prompt-engineering #agent #context-engineering

ai May 8, 2026

Local Deep Research Walkthrough: A Privacy-First Deep Research Agent

Local Deep Research is a privacy-first deep research agent built on LangChain + LangGraph, integrating 20+ search engines and 30+ research strategies. Its flagship langgraph_agent_strategy takes the LLM-autonomous tool-calling approach, offering a fundamentally different paradigm from fixed-pipeline RAG graphs.

#rag #agent #langgraph #deep-research #local-llm #langchain

ai deep-dive RAG 系統實戰 May 8, 2026

PageIndex: RAG Without Vectors — Turning Long Documents Into a Book With a Table of Contents

PageIndex skips chunking, embedding, and vector storage entirely. Instead it relies on LLM reasoning over a tree-structured table of contents the LLM itself wrote, achieving 98.7% on FinanceBench (GPT-4o reading directly scores only 31%). It solves a different problem than vector RAG — finding the right section in a well-structured long document.

#rag #llm #pageindex #vectorless #retrieval #financebench

ai May 7, 2026

Search MCP Tools for AI Agents: What to Do When WebFetch / WebSearch Gets Blocked

When using AI agents like Claude Code or Cursor, built-in WebFetch / WebSearch often gets blocked by Cloudflare, geo-restrictions, or rate limits. Connecting a search MCP server is the most direct fix. This post compares the options actually available in 2026.

#mcp #search #web-search #tavily #firecrawl #exa #bocha #claude-code #agent

ai May 6, 2026

Groq Console: The Developer Platform for Running Open-Source Models on LPU Inference

Groq Console is the developer portal for Groq's in-house LPU chip, offering an OpenAI-compatible API, Playground, and free tier credits. Its selling point is running open-source models like Llama, Qwen, and DeepSeek at the fastest tokens/second on the market.

#groq #lpu #inference #llm #openai-compatible #developer-platform

ai deep-dive May 2, 2026

goose: Open-Source, Cross-Platform, LLM-Agnostic Local AI Agent

goose is an open-source AI Agent maintained by the Linux Foundation's AAIF, supporting 15+ LLM providers and 70+ MCP extensions, built with Rust as a Desktop App + CLI + API. It positions itself as a vendor-neutral, self-hostable alternative to Claude Code.

#goose #ai-agent #open-source #mcp #rust #linux-foundation #aaif #claude-code #cli #desktop-app

ai guide Apr 28, 2026

Gemma on Cloudflare Workers AI: A Pragmatic Choice for Traditional Chinese Applications

For running LLMs on Cloudflare Workers AI, gemma-3-12b-it follows Traditional Chinese instructions noticeably better than llama-3.1-8b-instruct. With Gemma 4 arriving in 2026, you get Vision, Function calling, and 256K context -- upgrade as needed.

#gemma #cloudflare-workers-ai #llm #traditional-chinese

ai project Apr 28, 2026

Qwen (Tongyi Qianwen): Alibaba's Open-Source LLM Family, from 72B to 397B — A Complete Evolution Overview

Qwen (Tongyi Qianwen) is Alibaba's open-source LLM family, known for its Apache 2.0 license, 201-language coverage, and rapid iteration. The latest Qwen3.6 (2026/04) focuses on Agentic Coding — the 27B Dense version achieves 77.2% on SWE-bench and 59.3% on Terminal-Bench 2.0, on par with Claude Opus. A new Thinking Preservation feature lets agents retain reasoning context across turns.

#qwen #alibaba #llm #open-source #moe #multimodal #apache2 #ai-model #dashscope #on-device-ai #agentic-coding

ai Apr 23, 2026

Knowledge Management with LLMs: From Karpathy's llm-wiki to the Open-Source Ecosystem

Karpathy proposed the llm-wiki pattern in 2026, having LLMs proactively maintain a markdown wiki instead of running RAG from scratch every time. Over 100 open-source implementations now exist, ranging from local CLI tools to serverless Telegram bots.

#llm-wiki #knowledge-management #karpathy #obsidian #cloudflare #second-brain

ai Apr 23, 2026

OpenAI Workspace Agents: From Custom GPTs to a Team Automation Platform

On 2026/4/22 OpenAI launched Workspace Agents — powered by Codex, capable of long-running cloud execution, and integrating with Slack/Salesforce/Google Drive. They are the enterprise successor to Custom GPTs.

#openai #chatgpt #agent #workspace-agents #codex #enterprise-ai

ai guide RAG 系統實戰 Apr 23, 2026

Building a Legal Contract RAG in 36 Hours: Weaviate Query Agent + ColQwen Architecture Breakdown

Using Weaviate Query Agent + ColQwen multi-vector model, a single prompt built a production-grade legal contract search system in 36 hours -- this post breaks down its architecture logic, technology choices, and what you actually need to watch out for.

#rag #weaviate #legal-ai #colqwen #muvera #vector-database #agentic-search

ai guide Apr 21, 2026

Where AI Code Review Stands Now: Lessons from Cloudflare's Multi-Agent System

Cloudflare ran a Multi-Agent Code Review system internally for 30 days — 131K reviews, median 3 minutes. This post breaks down their architecture and compares it with solutions from Anthropic, GitHub, CodeRabbit, Greptile, and others.

#ai-code-review #multi-agent #cloudflare #claude-code #coderabbit #llm-ops #devops

ai guide Apr 21, 2026

Inside the Codex Agent Loop: How OpenAI Keeps AI Agents Iterating

A detailed look at OpenAI's Codex agent loop design: how prompts are constructed, how multi-turn conversations are managed, how prompt caching prevents cost explosions, and how context window auto-compaction works.

#codex #agent-loop #openai #responses-api #prompt-caching #context-window

ai guide Apr 21, 2026

Codex App Server: How OpenAI Turned an Agent Harness into a Universal Protocol

OpenAI wrapped the Codex harness as a JSON-RPC over stdio App Server, enabling VS Code, JetBrains, Web, and desktop apps to share a single agent loop. Three core primitives: Item, Turn, and Thread.

#codex #app-server #json-rpc #agent-harness #openai #harness-engineering

ai guide AI Agent 實戰 Apr 21, 2026

OpenAI Wrote 1 Million Lines of Code with Codex: Harness Engineering in Practice

An OpenAI internal team spent 5 months with 3 people and 0 lines of hand-written code, delivering a complete product using Codex. This article distills their core lessons on AGENTS.md design, repo-local knowledge bases, architecture enforcement, and entropy management.

#harness-engineering #codex #openai #agent-first #agents-md #agentic-coding

ai guide AI Agent 實戰 Apr 20, 2026

Agentic Engineering: Making AI Agents Collaborate Like a Real Engineering Team

Agentic Engineering isn't about making AI write code faster — it's about making software move through the entire delivery pipeline faster, by using multi-agent collaboration to compress cross-team coordination friction.

#agentic-engineering #multi-agent #langgraph #langsmith #a2a #mcp #worker-agent #leader-agent

ai guide AI Agent 實戰 Apr 20, 2026

The Memory Problem in Agentic Engineering: Types, Implementation, and Ownership

Agent memory isn't a plugin — it's part of the harness itself. Pick the right memory type, estimate data volume, then decide on the technology. And finally, figure out whether you actually own that memory.

#agentic-engineering #memory #langmem #agent-harness #context-engineering #multi-agent

ai guide Apr 20, 2026

Multi-Engine Code Review with Codex + Gemini + Claude: Principles, Patterns, and Implementation

AI models rationalize their own code when reviewing it. Using three different CLIs for independent review effectively catches blind spots -- this post covers the design philosophy and practical workflow patterns behind the approach.

#claude-code #gemini-cli #codex-cli #code-review #agentic-workflow #multi-model

ai guide Apr 18, 2026

Integrating AI Agents into Your Development Workflow: A Five-Phase SDLC Breakdown

Agentic AI is not just autocomplete — it is an AI system capable of autonomously executing multi-step tasks. This article breaks down the five phases of the SDLC, explaining where to plug in agents at each phase, how to progress from CLI tools to full-pipeline automation, and the most valuable external resources to track right now.

#agentic-ai #sdlc #coding-agents #github-actions #claude-code #spec-driven-development #ai-workflow

ai guide Apr 18, 2026

A Book Written by AI Itself, Teaching You How to Build Software with AI

Encyclopedia of Agentic Coding Patterns catalogues 190 patterns to help you make the right software decisions in the age of AI-written code — and the book itself is autonomously written and maintained by an AI agent.

#agentic-coding #design-patterns #llm #ai-agent #software-engineering #claude-code

ai guide Apr 18, 2026

GitHub Copilot Coding Agent: Assign an Issue to AI and Let It Open the PR

GitHub Copilot Coding Agent lets you assign an Issue to Copilot, which then automatically creates a branch, writes code, runs CI, and opens a PR — all inside a cloud sandbox. The key to success is setting up AGENTS.md; without it, the agent tends to go off track. Best suited for well-defined medium-sized tasks; requires Pro+ (1,500 premium requests/month) or Enterprise plan.

#github #copilot #coding-agent #ai-agent #github-actions #sandbox #pr-automation

ai guide Apr 18, 2026

knowledge-pipeline: A Six-Layer Pipeline for RAG Quality Control

A six-layer deterministic pipeline that handles everything from URL ingestion to vector embedding automatically, filtering out garbage before it enters your RAG system through an eight-dimension scoring system.

#rag #knowledge-management #pipeline #embedding #bge-m3 #sqlite #quality-control

ai guide Apr 18, 2026

MarkItDown: Convert Any File to Markdown Before Feeding It to an LLM

A lightweight open-source tool from Microsoft that converts PDF, Office, images, audio, and more into Markdown — purpose-built for LLM pipelines.

#markitdown #llm #rag #document-processing #python

ai guide Apr 18, 2026

MCP vs CLI vs API: The Real Boundaries of Agent Tool Interfaces

MCP is not going away, but its effective scope is narrower than most people think. For local development, CLI and raw API almost always beat MCP. MCP's truly irreplaceable niche is the narrow gap of 'cross-agent shared local tool layer.'

#mcp #agent #cli #api #claude-code #tool-use

ai guide Apr 17, 2026

Lessons from the Trenches: What AI Native Teams Must Get Right

Not everyone should use a coding agent to modify code directly. AI Native teams need interface specs, test-first development, monorepo, security guardrails, human-in-the-loop, and token budget controls. Building an agent platform layer on top of coding agents and clearly redefining developer roles is the right path forward.

#ai-native #coding-agent #spec-driven-development #monorepo #ci-cd #code-review #agent-platform #security #observability #git-worktree #adr #human-in-the-loop #cost-management #model-selection #developer-role #failure-handling

ai guide Apr 17, 2026

Autoreason: Teaching LLMs When to Stop Self-Refining

Autoreason replaces the traditional critique-and-revise loop with a competitive multi-version evaluation mechanism (A/B/AB + blind Borda count), solving three structural problems in LLM self-refinement: prompt bias, scope creep, and lack of restraint.

#autoreason #nous-research #self-refinement #llm #borda-count #iterative-reasoning #ai-agent

ai project Apr 17, 2026

Vercel Open Agents: Moving the Coding Agent from Your Laptop to the Cloud

An open-source coding agent reference implementation from Vercel Labs. A three-layer architecture separates the web UI, agent workflow, and sandbox VM — designed as a starting point for teams that want to self-host their own Claude Code or Cursor Background Agent.

#coding-agent #vercel #open-source #agent-infrastructure #sandbox

ai guide Apr 14, 2026

Claude Octopus: The Consensus Plugin That Hooks 8 Models Into Claude Code Simultaneously

Claude Octopus is a Claude Code plugin that simultaneously calls Codex, Gemini, Copilot, Qwen, Ollama, Perplexity, OpenRouter, and Claude to review the same code, using a 75% consensus threshold to catch single-model blind spots. It ships with 32 personas, 48 /octo:* slash commands, 51 skills, and a Dark Factory fully autonomous spec-to-code pipeline.

#claude-code #plugin #octopus #multi-model #consensus #orchestration #dark-factory

ai guide Apr 13, 2026

LLM Council: Karpathy's Weekend Multi-Model Parliament — Three Stages of LLM Peer Review

LLM Council is a local Web App Andrej Karpathy built over a weekend. It sends one question to multiple LLMs simultaneously, has them anonymously peer-review each other, and then a Chairman model synthesizes a final answer. Positioned as a small tool for comparing models while studying — 99% vibe coded with no plans for long-term maintenance — but the architecture itself is a minimal ensemble LLM implementation worth studying.

#llm-council #karpathy #multi-model #openrouter #fastapi #ensemble #peer-review

ai guide Apr 12, 2026

Claude Managed Agents: Letting Anthropic Handle the Agent Shell and Sandbox

Claude Managed Agents is a beta service launched by Anthropic on 2026/04/08 that provides an agent harness plus cloud container sandbox, billed per token plus $0.08/session-hour. It suits long-running async tasks and is worth exploring if you don't want to build your own agent loop and sandbox.

#claude #managed-agents #anthropic #ai-agent #sandbox #serverless #beta

ai guide Apr 10, 2026

Agent Skills: A Skill Framework That Makes AI Agents Work Like Senior Engineers

Agent Skills is Addy Osmani's open-source collection of 19 production-grade engineering skills that drive AI agents to follow senior engineering discipline through /spec → /plan → /build → /test → /review → /ship commands, instead of cutting corners.

#agent-skills #ai-agent #harness-engineering #claude-code #cursor #gemini-cli #development-workflow

ai guide Apr 10, 2026

Graphify: Turn Code and Documents into a Queryable Knowledge Graph

Graphify uses tree-sitter AST to extract code structure, then applies LLM semantic analysis to documents and images, compressing an entire project into a queryable knowledge graph. It claims to save 71.5x tokens per query compared to reading raw files.

#graphify #knowledge-graph #tree-sitter #ast #code-understanding #claude-code #mcp

ai project Apr 5, 2026

Claw Code: An Open-Source CLI Agent That Rewrites Claude Code in Rust

Claw Code is a from-scratch Rust rewrite of the Claude Code CLI, featuring 48K lines of code, 40 tools, and MIT licensing. Most remarkably, the entire project was built by multiple AI agents collaborating over just 5 days, surpassing 170K GitHub stars within a week of launch.

#agent-cli #claude-code #claw-code #rust #open-source #multi-agent #mcp

ai guide Apr 5, 2026

clawhip: An Event Notification Router That Keeps Multi-Agent Development Under Control

clawhip is a Rust daemon that routes AI coding agent events (commits, PRs, session status) to Discord / Slack, solving the observability problem of not knowing who is doing what when multiple agents run in parallel.

#agent-cli #clawhip #notification #discord #slack #tmux #rust #multi-agent #ultraworkers

ai guide Apr 5, 2026

Hermes Agent: Nous Research's Self-Improving AI Agent

Hermes Agent is an open-source self-improving AI agent by Nous Research, featuring persistent memory, skill learning, 40+ tools, multi-platform gateways, support for 200+ model providers, and serving as the official successor to OpenClaw.

#hermes-agent #nous-research #ai-agent #self-improving #gateway #multi-platform #openclaw

ai guide Apr 5, 2026

notebooklm-py: An Unofficial Python API for Google NotebookLM

notebooklm-py reverse-engineers Google's batchexecute RPC protocol, letting you programmatically control NotebookLM via Python / CLI / AI Agent — including audio, video, slides, quiz generation and more.

#notebooklm #google #reverse-engineering #python #rpc

ai guide Apr 5, 2026

oh-my-claudecode: An Enhancement Layer That Turns Claude Code into a Multi-Agent Collaboration Platform

oh-my-claudecode (OMC) adds 8 collaboration modes, 19 specialized agents, and cross-model orchestration (Claude + Codex + Gemini) on top of Claude Code, transforming a single-user CLI tool into a multi-agent development platform. Features include Deep Interview for requirement clarification, Smart Model Routing that saves 30-50% on tokens, and automatic rate limit recovery.

#agent-cli #claude-code #oh-my-claudecode #multi-agent #tmux #orchestration #ultraworkers

ai guide Apr 5, 2026

oh-my-codex: A Structured Workflow Enhancement Layer on Top of OpenAI Codex CLI

oh-my-codex (OMX) doesn't replace Codex CLI — it adds a structured workflow layer on top of it. From requirements clarification and plan generation to multi-agent parallel execution, four core Skills transform scattered prompt conversations into a trackable development process.

#agent-cli #openai-codex #oh-my-codex #workflow #multi-agent #tmux #developer-tools

ai guide Apr 5, 2026

oh-my-openagent: A Multi-Model Agent Team Framework That Replaces Single-LLM Coding

oh-my-openagent (OmO) transforms OpenCode from a single-LLM tool into a multi-model agent team — Opus as the workhorse, GPT-5.2 as the architect, Gemini for frontend, Sonnet for documentation lookup — all triggered to run in parallel with a single ultrawork keyword. With 48K stars, it is the earliest project in the UltraWorkers ecosystem to establish the multi-agent coding pattern.

#agent-cli #oh-my-openagent #opencode #multi-agent #multi-model #orchestration #ultraworkers

ai project Apr 5, 2026

OpenHarness: A Fully Open-Source Agent Harness Framework

An open-source Agent Harness framework from HKUDS (HKU Data Science Lab) that implements tool calling, skill loading, memory, permissions, and multi-agent collaboration as complete infrastructure, supporting Anthropic / OpenAI / GitHub Copilot API formats.

#agent-harness #open-source #multi-agent #tool-use #mcp

ai guide Apr 4, 2026

How to Use Claude Code Agent Teams? Design Patterns from 6,400+ Agents on GitHub

There are already 6,400+ .claude/agents/*.md files on GitHub. We dissected 4 representative projects — ChemistryTimes (content production pipeline), claude-sub-agent (document-driven development pipeline), agentic (Temporal.io DAG parallel execution), and vs-copilot-multi-agent (hook-enforced memory persistence) — plus ruflo's enterprise-grade swarm architecture, distilling 6 design patterns and 5 practical trends.

#claude-code #agent-teams #subagent #multi-agent #orchestrator-pattern #ai-pipeline #context-engineering #harness-engineering #temporal #swarm #quality-gates

ai guide AI Agent 實戰 Apr 4, 2026

From Stripe to Meta: How Silicon Valley's Top Companies Replace Keyboards with AI Agents

Top Silicon Valley companies are independently building internal AI coding agents that automate everything from a Slack message to a merged PR. This article deep-dives into architectures from Stripe, Ramp, Coinbase, and Spotify, then expands to cover Google, Meta, Amazon, Uber, Goldman Sachs, Walmart, and more.

#ai-agent #coding-agents #stripe-minions #agentic-coding #developer-tools #automation #meta #google #uber #amazon

ai guide Apr 3, 2026

Three Modes of LLM Knowledge Bases: Knowledge Vault, Experience Vault, and Blog

Andrej Karpathy proposed a framework for compiling personal knowledge wikis with LLMs — collect raw data, have the LLM compile it into .md wiki pages, run Q&A against the wiki, and file outputs back. This post compares three practical approaches: Karpathy's knowledge vault model, the community's experience vault model, and quidproquo's blog model.

#llm-knowledge-base #obsidian #knowledge-management #fine-tuning #rag #claude-code #karpathy

ai guide Apr 3, 2026

AI Agent Caching Goes Beyond One Layer: From Claude Code's 18 Cache Types to Multi-Layer ReAct Agent Design

After dissecting Claude Code's 18+ caching mechanisms, I found that you can't touch provider-level prompt cache, but embedding cache, tool result cache, and entity cache are not only within your reach — they deliver even better results. Includes a complete AgentCache interface design and per-tool TTL strategy.

#react-agent #cache #prompt-cache #semantic-cache #claude-code #cloudflare-kv #llm-cost-optimization

ai guide Apr 3, 2026

AI Agent Tool Descriptions Shouldn't Be Static: Dynamic prompt() Design Learned from Claude Code

Every one of Claude Code's 45 tools uses a prompt() method that dynamically adjusts based on user type, feature flags, and system capabilities. Applying this pattern to a ReAct Agent, tool descriptions are dynamically generated along three dimensions: orchestrator model capability, locale, and available tools. Small models automatically get few-shot examples; large models save tokens.

#react-agent #tool-use #prompt-engineering #claude-code #few-shot #dynamic-prompt

ai guide Apr 2, 2026

Claude Code Complete Breakdown: The Deep Reasoning King of Terminal Agents

From $20/mo Pro to $200/mo Max 20x, Claude Code's Opus 4.6 delivers the strongest reasoning depth in the industry, and its Max plan's unlimited pricing saves heavy users over 90% compared to API costs.

#agent-cli #claude-code #pricing #opus #sonnet #haiku #subagent #anthropic

ai guide Apr 2, 2026

Cursor CLI Complete Analysis: The All-Rounder Extending IDE Agent to the Terminal

Cursor CLI brings the IDE Agent into the terminal, supporting interactive TUI and headless modes, Plan/Ask/Agent three modes, Cloud Handoff, CI/CD integration, $20-200/mo.

#agent-cli #cursor #pricing #cli-agent #cloud-handoff #plan-mode #tui

ai guide Apr 2, 2026

Gemini CLI Complete Analysis: The Terminal Agent with the Most Generous Free Tier in the Industry

Gemini CLI will be discontinued on 2026/06/18, with Antigravity CLI as the official successor. Before shutdown: free 60 req/min, 1,000 req/day, including Gemini 2.5 Pro and 1M token context window. Skills, Hooks, and Subagents can all be migrated.

#agent-cli #gemini-cli #google #pricing #free-tier #terminal-agent #antigravity

ai guide Apr 2, 2026

Kiro (AWS) Complete Analysis: The Spec-Driven Agentic IDE

Kiro's free plan includes 50 credits. Auto mode intelligently mixes models to save costs. Spec-Driven development upgrades vibe coding into traceable, structured workflows. Agent Hooks enable local CI/CD automation.

#agent-cli #kiro #aws #pricing #auto-mode #specs #hooks #bedrock

ai guide Apr 2, 2026

OpenAI Codex Complete Plan Analysis: Agent Integration in the ChatGPT Ecosystem

Codex is tied to ChatGPT subscriptions ($20-200/mo). GPT-5.4 + mini automatic routing is the highlight, and the CLI supports dual billing via Plan mode and API Key mode.

#agent-cli #openai-codex #pricing #gpt-5 #chatgpt #model-routing

ai project Apr 2, 2026

OpenCode Full Analysis: An Open-Source Terminal Agent Supporting 75+ Model Providers

OpenCode is a free, open-source CLI agent written in Go with 95K+ GitHub stars. It supports 75+ model providers including local Ollama, allows authentication via Copilot/ChatGPT accounts, and lets you switch models mid-session without losing context.

#agent-cli #opencode #open-source #terminal-agent #multi-provider #ollama

ai guide Apr 2, 2026

Agent CLI Subscription Plans Compared: Building a Flexible Multi-Model Routing Strategy

Comparing six major Agent CLI subscription plans in 2026 (Claude Code, Cursor CLI, Codex, Kiro, Gemini CLI, OpenCode), and exploring multi-model routing patterns — routing simple tasks to cheaper models and complex tasks to flagship models, with real-world savings of 40-85%.

#agent-cli #multi-model-routing #claude-code #cursor #codex #kiro #gemini-cli #opencode #llm-router #cost-optimization

ai guide Apr 2, 2026

2026 Personal AI Hardware Buying Guide: DGX Spark, Mac Studio, MSI AI Edge Compared

Comparing the NVIDIA DGX Spark, Apple Mac Studio M4 Ultra, ASUS Ascent GX10, MSI AI Edge, and more — helping you find the right local inference hardware.

#hardware #local-inference #dgx-spark #mac-studio #msi-ai-edge #asus-ascent-gx10 #llm #edge-ai

ai guide Apr 2, 2026

Multi-Model Routing Open-Source Tools & Implementation: Getting the Right Model for the Right Job

With multi-model routing, 70% of simple tasks are directed to cheap models, and only 10-15% of complex tasks use flagship models — saving 40-85% on inference costs in practice. This article covers the architecture and implementation of five major open-source tools.

#multi-model-routing #llm-router #cost-optimization #agent-router #freerouter #ruflo

ai guide Apr 1, 2026

The Complete Guide to Agent CLIs: Design Logic, Tool Comparison, and Best Practices

Agent CLIs are not smarter autocomplete tools -- they are AI agents that can read your codebase, execute multi-step tasks, and operate in real environments. Claude Code, Codex CLI, Gemini CLI, OpenCode, Aider, Pi, Kiro, Amp, Cursor CLI... the tools keep multiplying, but they all share a common set of design principles -- understanding these principles is how you actually get good at using them.

#agent-cli #claude-code #codex-cli #gemini-cli #opencode #pi #kiro #aider #amp #cursor-cli #agentic-ai #developer-tools #cli #mcp #context-engineering

ai guide Apr 1, 2026

15 Agent Frameworks Worth Watching in 2026

Sorted by GitHub Stars, a survey of 15 mainstream AI Agent frameworks in 2026 — their positioning, key features, and ideal use cases. Not a ranking — it's a map.

#agent #framework #langgraph #crewai #openai #anthropic #google-adk #mastra #openclaw #dify #n8n #llamaindex #metagpt #smolagents #agno #pydantic-ai

ai guide Apr 1, 2026

One Sentence to an IG Carousel — From 3 Hours Manual Work to a Fully Automated Pipeline

Use Claude Code as an orchestrator to chain Playwright screenshots, catbox.moe image hosting, Meta Graph API publishing, and Telegram notifications — generate and publish an IG carousel from a single sentence.

#claude-code #instagram #automation #playwright #github-actions #meta-graph-api

ai guide Apr 1, 2026

llama.cpp — From Pure C++ to an LLM Inference Engine on Consumer Hardware

llama.cpp is the most widely used local LLM inference engine, implemented in pure C/C++. It supports CPU, Metal, CUDA, Vulkan, and other backends, and uses the GGUF quantization format to run multi-billion-parameter models on consumer hardware.

#llama-cpp #gguf #quantization #llm-inference #apple-silicon #metal #cuda #local-llm

ai guide Apr 1, 2026

TurboQuant+ — Two-Stage Quantization to Compress KV Cache to 2-bit, Running 100B Models on a MacBook

TurboQuant+ is an open-source implementation of a Google Research ICLR 2026 paper that uses PolarQuant + QJL two-stage quantization to compress the KV cache by 3.8-6.4x, enabling consumer hardware to run larger models with longer contexts.

#turboquant #kv-cache #quantization #llm-inference #llama-cpp #apple-silicon

ai guide Mar 31, 2026

Small Models That Run on Phones: Choices and Constraints in 2026

The main on-device LLMs in 2026 are Gemma 3n, Qwen 3.5 Small, Llama 3.2, Phi-4-mini, Ministral 3, and SmolLM3. Sub-3B quantized models can hit 30-50 tokens/sec on phones with 8GB RAM, but RAM, thermal throttling, and context window remain hard constraints.

#on-device-ai #small-models #mobile #quantization #llama #gemma #phi #qwen #mistral #smollm #mobilellm

ai project Mar 31, 2026

2026 Q1 Open-Source LLM Landscape: From Frontier Models to On-Device, a Complete Survey

2026 Q1 saw a full-blown open-source model explosion: on the LLM front, GLM-5, Kimi K2.5, and Qwen3.5 caught up with closed-source models; Embedding and Reranker are dominated by Qwen3 and BGE; speech has Voxtral TTS and Whisper V3; image has FLUX.2; and video has Wan 2.2 rivaling Sora. This is the complete navigation map.

#open-source #llm #glm-5 #kimi #deepseek #qwen #llama #gemma #mistral #minimax #phi #smollm #gpt-oss #moe #on-device-ai #embedding #reranker #tts #stt #image-generation #video-generation #code-model #ollama #vllm

ai guide Mar 30, 2026

AI-Ready Content: The Complete Guide to Making Your Website an AI-Readable Data Source

In 2025-2026, websites need to be readable not just by humans but by AI. From llms.txt and Schema Markup to GEO and RAG ingestion pipelines, this post maps out the complete technical landscape for turning your website into an AI-consumable data source.

#ai-ready-content #llms-txt #geo #rag #web-scraping #structured-data #mcp #seo #rsl #webmcp

ai guide AI Agent 實戰 Mar 30, 2026

Advanced Harness Engineering Patterns: Tool Registry, Guard System, and Checkpoint-Resume

A Harness is more than just an LLM wrapper. Tool Registry manages dynamic tool loading and selection, Guard System establishes a four-layer defense network, and Checkpoint-Resume enables long-running tasks to survive interruptions. These three patterns form the critical infrastructure of production-grade Agent systems.

#harness-engineering #tool-registry #guard-system #checkpoint-resume #agent

ai guide Mar 30, 2026

Skill vs Subagent: Comparing Two Agent Collaboration Modes in Claude Code

A Skill is a prompt template you invoke manually. A Subagent is an independent agent that Claude routes to automatically. They look similar, but differ completely in trigger mechanism, tool isolation, and context management.

#claude-code #multi-agent #subagent #skill

ai guide Mar 30, 2026

Ticketing Is Dead — Review Is the New Planning

When AI agents can turn intent into a PR in minutes, the bottleneck in software engineering flips from 'planning what to do' to 'evaluating whether the output is correct.' Artifacts of the ticketing era — sprints, story points, backlog grooming — are collapsing to zero, replaced by review as the core practice.

#code-review #software-engineering #ai-agent #adr #developer-workflow #ticketing

ai guide AI Agent 實戰 Mar 28, 2026

Anthropic's Harness Design: Making AI Agents Work Like Engineers

The same model produces dramatically different results under different harness designs. Anthropic uses a dual-agent architecture, cross-session state files, and a GAN-inspired generator-evaluator loop to let Claude autonomously complete hours-long software development tasks.

#harness-design #ai-agent #anthropic #claude #multi-agent #long-running-agents #agent-sdk

ai guide Mar 28, 2026

Google's Eight Multi-Agent Design Patterns

Google outlined eight multi-agent design patterns: from the simplest Sequential Pipeline to the composable Composite Pattern. More complexity isn't always better — picking the right pattern matters more than stacking agents.

#multi-agent #design-patterns #google #agent-architecture #generator-critic #orchestration

ai guide AI Agent 實戰 Mar 28, 2026

From Prompt to Harness: The Three Evolutions of AI Engineering

AI engineering has gone through three phases: Prompt Engineering (write better instructions) → Context Engineering (feed the right information) → Harness Engineering (design the entire working environment). Each evolution doesn't replace the previous one — it operates at a higher level of abstraction.

#harness-engineering #prompt-engineering #context-engineering #ai-agent #agentic-ai

ai guide Mar 28, 2026

OpenClaw Agent Loop: Execution Cycle, Streaming & Queue

A single agent execution: receive message → assemble context → model inference → tool execution → stream response → persist. Each session runs serially, with 5 queue modes supported.

#openclaw #agent-loop #streaming #queue #messages #debounce

ai guide Mar 28, 2026

OpenClaw Agent Runtime: Workspace, System Prompt, and Bootstrap

Every OpenClaw agent has its own 'home' (Workspace), with personality and behavior defined by bootstrap files like AGENTS.md and SOUL.md. The System Prompt is dynamically assembled each time.

#openclaw #agent #workspace #system-prompt #bootstrap #soul-md #agents-md

ai guide Mar 28, 2026

OpenClaw Access Control: Authentication, Secrets, and OAuth

API Key is the most stable option; OAuth uses PKCE + token sink pattern; SecretRef supports env/file/exec sources; Trusted Proxy delegates authentication to a reverse proxy.

#openclaw #authentication #secrets #oauth #trusted-proxy #secretref #security

ai guide Mar 28, 2026

OpenClaw Automation (Part 1): Cron, Heartbeat, and Webhook

Heartbeat for periodic checks (30-minute batches), Cron for precise scheduling (with isolated sessions and model overrides), Webhook for receiving external event triggers.

#openclaw #cron #heartbeat #webhook #automation #scheduling

ai guide Mar 28, 2026

OpenClaw Automation (Part 2): Standing Orders — Permanent Directives

Standing Orders grant an agent permanent authorization to execute defined programs — with explicit scope, triggers, approval gates, and escalation rules, paired with Cron for time-based control.

#openclaw #standing-orders #automation #agents-md #autonomous

ai guide Mar 28, 2026

OpenClaw Enterprise Channels: Slack, Teams, Google Chat & Matrix

Slack has the most complete enterprise features (native streaming, slash commands). Teams requires Azure Bot setup. Matrix supports E2EE encryption.

#openclaw #slack #microsoft-teams #google-chat #matrix #enterprise

ai guide Mar 28, 2026

OpenClaw Primary Channels: WhatsApp, Telegram, Discord

WhatsApp uses QR pairing + Baileys, Telegram is the fastest to set up with a Bot Token, and Discord supports guild/thread/button interactive components.

#openclaw #whatsapp #telegram #discord #channels

ai guide Mar 28, 2026

OpenClaw Other Channels: Signal, iMessage, LINE, IRC, Nostr, and More

Signal uses signal-cli for privacy, iMessage is best via BlueBubbles, LINE uses webhooks, IRC/Nostr/Twitch each have their own character.

#openclaw #signal #imessage #bluebubbles #line #irc #nostr #twitch #zalo

ai guide Mar 28, 2026

OpenClaw Channels Overview: Pairing, Groups, and Routing

OpenClaw supports 24+ channels running simultaneously, using Pairing to control who can chat, Group Policy to control group behavior, and Routing to decide which agent receives messages.

#openclaw #channels #pairing #groups #routing #broadcast

ai guide Mar 28, 2026

OpenClaw Gateway Part 1: Configuration System and Hot Reload

openclaw.json uses JSON5 format with strict schema validation, supporting hybrid hot reload — safe changes apply instantly while critical changes trigger automatic restarts.

#openclaw #gateway #configuration #json5 #hot-reload #openclaw-json

ai guide Mar 28, 2026

OpenClaw Gateway (Part 2): Remote Access, Tailscale, and Multi-Gateway

Gateway binds to loopback by default. Use SSH tunnel or Tailscale Serve/Funnel for remote access; multiple Gateways can distribute load.

#openclaw #gateway #remote-access #tailscale #ssh-tunnel #multi-gateway

ai guide Mar 28, 2026

OpenClaw Installation Guide (Part 2): Cloud Platforms, K8s & VPS Deployment

OpenClaw supports deployment to 9 cloud platforms, K8s, and Ansible automated provisioning — you can run a 24/7 Gateway for as little as $5/month.

#openclaw #deployment #kubernetes #fly-io #hetzner #gcp #azure #ansible #vps

ai guide Mar 28, 2026

OpenClaw Installation Guide (Part 1): npm, Docker, Nix & Local Deployment

OpenClaw offers 6 local installation methods: installer script, npm, Docker, Podman, Nix, and Bun, plus Raspberry Pi deployment and building from source.

#openclaw #installation #docker #nix #podman #raspberry-pi #bun

ai guide Mar 28, 2026

OpenClaw Model Advanced: Failover, Prompt Caching, and Token Billing

OpenClaw has built-in two-stage fault tolerance with Auth rotation + Model Fallback, plus Prompt Caching for cost savings and comprehensive Token tracking.

#openclaw #model-failover #prompt-caching #token-usage #cost-optimization

ai guide Mar 28, 2026

OpenClaw's Model Requirements and Provider Ecosystem

OpenClaw supports 35+ model providers. The minimum requirement is that the model supports tool use + streaming. It has built-in auth rotation and model failover mechanisms.

#openclaw #llm #anthropic #openai #gemini #model-failover #tool-use

ai guide Mar 28, 2026

OpenClaw Additional Providers: DeepSeek, Groq, Ollama, OpenRouter, Bedrock...

Beyond the big three (Anthropic/OpenAI/Google), OpenClaw supports 30+ providers — from DeepSeek to local Ollama and everything in between.

#openclaw #deepseek #groq #ollama #openrouter #vllm #bedrock #sglang #mistral

ai guide Mar 28, 2026

OpenClaw Multi-Agent and Delegate Architecture

OpenClaw supports running multiple isolated agents within a single Gateway, routing messages via bindings, and enabling AI to act on your behalf through its Delegate architecture.

#openclaw #multi-agent #delegate #session-management #routing

ai guide Mar 28, 2026

OpenClaw Nodes Deep Dive: Mobile Devices and Remote Hosts

Nodes are peripheral devices for the Gateway -- iOS/Android provide camera/location/notifications, macOS provides Canvas/system.run, and Node Host enables remote exec on other machines.

#openclaw #nodes #ios #android #macos #camera #canvas #location #sms

ai guide Mar 28, 2026

OpenClaw Documentation Guide: 200+ Docs — Where Do You Start?

OpenClaw has 200+ docs. This article helps you see the big picture, understand what each section covers, and decide where to start based on your role.

#openclaw #ai-gateway #self-hosted #documentation #guide

ai deep-dive Mar 28, 2026

OpenClaw Reference: Pi Integration & Configuration Reference

Pi is OpenClaw's embedded coding agent runtime; OpenClaw is Pi's Gateway shell. This configuration reference covers 16 top-level sections and 335 documents.

#openclaw #pi #reference #configuration #features #architecture

ai guide Mar 28, 2026

OpenClaw Desktop Platforms: macOS, Linux, and Windows

OpenClaw has a menu bar app on macOS, runs as a systemd service on Linux, and recommends WSL2 on Windows. Here are the differences and considerations across all three platforms.

#openclaw #macos #linux #windows #wsl2 #systemd #launchd

ai guide Mar 28, 2026

OpenClaw Mobile Platforms: iOS and Android

OpenClaw's iOS and Android apps are not Gateways — they are Nodes, turning your phone's camera, screen, location, and voice into sensory extensions for AI agents.

#openclaw #ios #android #mobile #node #canvas #camera #voice-wake

ai guide Mar 28, 2026

OpenClaw Plugin System: Architecture and Development Guide

Plugins are built with TypeScript ESM and support 12 capability registrations (channels, models, tools, TTS, images, etc.), published to ClawHub or npm.

#openclaw #plugins #sdk #clawhub #channel-plugin #provider-plugin #typescript

ai guide Mar 28, 2026

OpenClaw Sandbox Mechanism: Docker, SSH, and OpenShell

OpenClaw's sandbox has three layers of control: Sandbox determines where code runs (Docker/SSH/OpenShell), Tool Policy determines which tools are available, and Elevated is the host escape hatch for exec.

#openclaw #sandbox #docker #ssh #openshell #security #tool-policy #elevated

ai guide Mar 28, 2026

OpenClaw Session, Memory, and Compaction

OpenClaw sessions support 4 DM isolation levels, Memory is stored as Markdown files, and Compaction automatically summarizes and compresses when context is nearly full.

#openclaw #session #memory #compaction #context-engine #pruning

ai guide Mar 28, 2026

OpenClaw Threat Model: MITRE ATLAS Security Analysis and Formal Verification

OpenClaw uses the MITRE ATLAS framework to analyze AI system threats, identifying three Critical risks (prompt injection, malicious skills, credential theft), and employs TLA+ formal verification for security properties.

#openclaw #security #mitre-atlas #threat-model #formal-verification #tla-plus

ai guide Mar 28, 2026

OpenClaw Tools (Part 1): Browser Control and Web Search

OpenClaw's browser uses managed profiles for isolation, supports remote CDP (Browserless/Browserbase), and Deep Research combines search and browsing for multi-step research.

#openclaw #browser #web-search #deep-research #browserless #browserbase

ai guide Mar 28, 2026

OpenClaw Tools (Part 3): Exec Tool, Thinking Levels, and Slash Commands

Exec supports foreground/background/PTY execution with three security levels (deny/allowlist/full). Thinking has 7 levels (off to adaptive). Slash Commands come in two types: commands and directives.

#openclaw #exec #thinking #slash-commands #fast-mode #verbose #reasoning

ai guide Mar 28, 2026

OpenClaw Tools (Part 2): Skills System and Sub-Agents

Skills are AgentSkills-compatible SKILL.md folders with a 6-tier loading priority. ClawHub is the public marketplace. Sub-agents can nest up to 5 levels deep.

#openclaw #skills #clawhub #sub-agents #skill-md #agent-skills

ai guide Mar 28, 2026

OpenClaw Tools (Part 4): TTS, PDF, Lobster, and MCP

TTS supports three providers — ElevenLabs, Microsoft, and OpenAI. PDF has native and extraction modes. Lobster is a deterministic workflow runtime. MCP enables external tool integration.

#openclaw #tts #pdf #lobster #mcp #media #elevenlabs #openai-tts

ai debug Mar 28, 2026

OpenClaw Operations: Troubleshooting and Diagnostics

openclaw doctor is the all-in-one diagnostic tool, openclaw sandbox explain troubleshoots sandbox issues, and openclaw channels status --probe checks channel connectivity.

#openclaw #troubleshooting #doctor #diagnostics #operations

ai guide Mar 28, 2026

OpenClaw UI: Control UI, TUI, and Web Chat

Control UI is a browser dashboard (http://127.0.0.1:18789), TUI is a terminal interactive interface, and Web Chat is a WebSocket real-time chat.

#openclaw #control-ui #tui #web-chat #dashboard #terminal

ai guide Mar 28, 2026

Phil Schmid: Why Agent Harness Is the Most Important Thing in 2026

The model is the CPU, the harness is the operating system, and the agent is the application. No matter how powerful a model is, without a good harness it's just a demo. Phil Schmid argues that harness is the most critical infrastructure in AI engineering for 2026.

#harness-engineering #ai-agent #agent-harness #model-drift #benchmarks #claude-code

ai guide Mar 27, 2026

LangGraph: Managing Agent Workflows with Graph Structures

LangGraph models LLM workflows as directed graphs, solving the pain points of multi-turn iteration, conditional branching, and parallel execution that are difficult to handle with linear pipelines.

#langgraph #agent #orchestration #rag #workflow

ai project Mar 26, 2026

GLM-5: Zhipu AI's 744B Open-Source Model Trained Entirely on Huawei Chips

GLM-5 is a 744B MoE open-source model released by Zhipu AI (Z.ai) in February 2026, trained entirely on Huawei Ascend chips and released under the MIT license. It currently ranks as the top open-source model, surpassing Claude and GPT-5 on benchmarks like Humanity's Last Exam, while its API pricing is 1/5 to 1/8 of theirs.

#glm-5 #zhipu-ai #智譜ai #llm #moe #open-source #huawei-ascend #ai-model #agent

ai project Mar 26, 2026

Kimi: How Moonshot AI's Long-Context Model Challenges GPT and Claude

Kimi is a large language model from Chinese AI startup Moonshot AI, known for its ultra-long context window, open-source strategy, and highly competitive pricing. From 200K context in 2023 to K2.5 Agent Swarm in 2026, Kimi has become a force that the global AI market cannot ignore.

#kimi #moonshot-ai #llm #long-context #reasoning #月之暗面 #ai-model #moe #open-source

ai guide Mar 26, 2026

Langfuse Complete Guide: LLM Application Observability from Scratch

Langfuse is currently the most mature open-source LLM Observability platform. This post covers four core capabilities — Tracing, Prompt Management, Evaluation, and Datasets — showing you how to use them in real projects.

#langfuse #observability #tracing #llm #prompt-management #evaluation #monitoring

ai guide AI Agent 實戰 Mar 24, 2026

Context Engineering: Why Your AI Agent's Problem Is Information, Not the Model

Context Engineering is the core concept that replaced Prompt Engineering in 2025: the focus shifted from 'how to ask' to 'what information to provide.' Delivering the right information at the right time into the context window is more effective than upgrading to a stronger model. This post covers the definition, four key strategies, practical techniques, and common failure modes.

#context-engineering #prompt-engineering #ai-agent #rag #memory #agentic-ai

ai guide Mar 22, 2026

MCP (Model Context Protocol): The Standardized Protocol for AI Agent Tool Invocation

Every AI tool has its own calling format, making integration costly. MCP (Model Context Protocol) is an open standard proposed by Anthropic that unifies the communication protocol between AI Agents and external tools/data sources, enabling tools to be reused across Agents.

#mcp #model-context-protocol #tool-use #agent #anthropic

ai guide Mar 20, 2026

Claude Certified Architect Foundations Exam Complete Guide

A complete study guide for Claude's official architect certification: five exam domains, six scenario types, common anti-patterns, and hands-on preparation strategies.

#claude #certification #agentic-ai #mcp #prompt-engineering #claude-code #agent-sdk

ai guide Mar 19, 2026

Agent Memory Systems: From RAG to Read-Write Memory Evolution

RAG is read-only. Agent Memory lets AI not only read but also write and persist information. Three memory types: Procedural (behavior patterns), Episodic (temporal events), and Semantic (factual knowledge) form a complete cognitive memory system.

#agent #memory #procedural-memory #episodic-memory #semantic-memory #rag

ai deep-dive Mar 18, 2026

Complete Guide to AI Agent Architecture Patterns: From Three Pillars to Multi-Agent Systematic Navigation

AI Agent is not a single technology -- it is an entire architecture system. This article is a systematic navigation: starting from the Agent Three Pillars (Context/Cognition/Action), through the three-stage evolution of AI engineering (Prompt -> Context -> Harness), to eight Multi-Agent design patterns and production-grade Harness infrastructure. Each topic links to a dedicated deep-dive article.

#agent #architecture #harness #multi-agent #mcp #context-engineering #guide

ai guide Mar 17, 2026

The Three Core Pillars of AI Agents: Context, Cognition, Action

An AI agent is not a black box — it is built from three layers: what it knows (Context), how it thinks (Cognition), and what it can do (Action). Understanding these three layers is the key to grasping why agents are sometimes brilliant and sometimes go off the rails, and how to design a truly effective agent system.

#ai-agent #context-engineering #llm #reasoning #ReAct #agentic-ai #memory #mcp

ai guide RAG 系統實戰 Mar 16, 2026

Multi-Agent RAG: Distributed Retrieval Architecture with Specialized Agent Collaboration

A single RAG Agent handling all queries hits knowledge boundaries and performance bottlenecks. Multi-Agent RAG dispatches retrieval tasks to multiple specialized Agents, each with its own knowledge base and retrieval strategy, coordinated by a central Orchestrator that merges results.

#rag #multi-agent #orchestration #distributed-retrieval #agent

ai guide Mar 15, 2026

LongRAG: Rethinking RAG Chunking Strategy with Long-Context Models

Traditional RAG splits documents into small chunks for retrieval, but this causes information fragmentation. LongRAG leverages 100K+ token long-context models to retrieve larger document segments (entire sections or even whole documents), reducing fragmentation while maintaining retrieval efficiency.

#rag #longrag #long-context #chunking #retrieval

ai guide Mar 15, 2026

Speculative RAG: Small Models Draft in Parallel, Large Model Verifies at Once

Speculative RAG uses small specialist models to generate multiple answer drafts from different document subsets in parallel, then a large model verifies and selects the best answer in one pass. Accuracy improves up to 12.97%, latency drops up to 50.83%.

#rag #speculative-rag #dual-model #latency-optimization #accuracy

ai guide Mar 14, 2026

The Complete Ollama Guide: Run LLMs Locally with One Command

Ollama wraps llama.cpp in a Docker-style CLI + REST API, letting you run LLMs locally with a single command. This post covers core concepts, installation, API, hardware requirements, Modelfile customization, and what this tool is — and isn't — good for.

#ollama #llm #local-inference #llama-cpp #self-hosted #openai-compatible

ai guide RAG 系統實戰 Mar 14, 2026

The Complete Guide to RAG System Patterns: A Ten-Generation Evolution from Naive to Multi-Agent with Practical Navigation

RAG has evolved far beyond simple 'search + generate' into a technology ecosystem spanning ten generations. This article is a systematic navigation guide: from Naive RAG to Multi-Agent RAG across ten generations, covering retrieval strategies, chunking, embedding, reranking, evaluation frameworks, observability, and cost optimization. Each topic has a dedicated deep-dive article.

#rag #guide #retrieval #embedding #reranking #evaluation #agent

ai guide Mar 14, 2026

vLLM — From PagedAttention to a Production-Grade LLM Inference Engine

vLLM uses PagedAttention to eliminate KV cache memory waste, combining continuous batching and prefix caching to become the most widely adopted open-source LLM inference engine today.

#vllm #llm-inference #pagedattention #model-serving #gpu

ai guide Mar 13, 2026

Complete Chatbot Development Guide: State Management, Memory Strategies, and Tech Stack Selection

Building a chatbot is more than just calling an API. Conversation state management, memory mechanisms, streaming, guardrails, observability, and tech stack selection — every layer affects the user experience.

#chatbot #state-management #memory #streaming #guardrails #langfuse

ai guide Mar 13, 2026

Prompt Engineering in Practice: Iteration Methodology, Common Mistakes, and Few-shot Optimization

Good prompts aren't written in one go — they're iterated into existence. Start with the simplest prompt, test with real cases, classify error types, and make targeted fixes. This article covers the three-part System Prompt structure, reasoning framework selection, few-shot optimization, token budget management, and six common mistakes.

#prompt-engineering #few-shot #chain-of-thought #iteration #llm

ai guide Mar 12, 2026

Agentic RAG: Letting the LLM Decide When to Search Again

For complex multi-hop questions, a single RAG search isn't enough. Agentic RAG lets the LLM evaluate whether retrieved results are sufficient — if not, it rewrites the query and searches again, forming a ReAct loop.

#rag #agentic-rag #react #multi-hop #llm-agent

ai guide Mar 12, 2026

BGE-M3: Why This Embedding Model Works Well for Traditional Chinese RAG

Your choice of embedding model directly determines RAG search quality. BGE-M3's multilingual training, 1024-dimensional vectors, and matching Reranker make it a practical pick for Traditional Chinese RAG.

#rag #embedding #bge-m3 #multilingual #vector-search #cloudflare-workers-ai

ai guide Mar 12, 2026

Chunking Strategies: How You Split Text Determines Whether RAG Can Find the Answer

Chunks too large and retrieval loses precision; too small and you lose context. Chunking is the most underrated part of RAG — pick the wrong strategy and no amount of downstream optimization will save you.

#rag #chunking #indexing #text-splitting #retrieval

ai guide Mar 12, 2026

ColBERT: The Third Way in Vector Search

Bi-Encoders are too coarse, Cross-Encoders are too slow — ColBERT's Late Interaction finds the sweet spot: token-level comparison between query and document, but with document vectors that can be precomputed.

#rag #colbert #late-interaction #retrieval #reranking

ai guide Mar 12, 2026

Contextual Retrieval: Giving Every Chunk Its "What This Is About" Context

When you split a document into chunks, each chunk loses its place in the original document. Contextual Retrieval solves the isolated-chunk problem by injecting a document-level summary into every chunk at index time.

#rag #contextual-retrieval #chunking #indexing #embedding

ai guide Mar 12, 2026

CRAG: Automatically Relaxing Filters When Retrieval Comes Up Empty

Filters too strict and getting zero results? CRAG automatically relaxes them and retries — far better than letting the LLM hallucinate an answer from general knowledge.

#rag #crag #corrective-rag #retrieval #fallback

ai guide Mar 12, 2026

Cross-Encoder Reranking: Surfacing the Most Relevant Documents

Vector search similarity scores don't equal relevance. Cross-Encoders use pairwise comparison to reorder results and push the truly relevant documents to the top.

#rag #reranking #cross-encoder #bge-reranker #retrieval

ai guide Mar 12, 2026

GraphRAG: Structuring Knowledge as a Graph for Relationship-Based Reasoning

Vector search finds similarity; graph search traverses relationships. When a question requires reasoning across multiple entities — crag → route → sender → grade distribution — GraphRAG outperforms standard RAG.

#rag #graphrag #knowledge-graph #multi-hop #microsoft

ai guide RAG 系統實戰 Mar 12, 2026

Hybrid Search: Using BM25 + Vector Search to Cover Each Other's Blind Spots

Vector search handles semantics; BM25 handles keywords. Combining them with RRF is what lets you handle both fuzzy queries and exact terms at the same time.

#rag #hybrid-search #bm25 #vector-search #rrf #embedding

ai guide Mar 12, 2026

HyDE: Boosting Vector Search Recall with Hypothetical Answers

Have an LLM generate an 'ideal answer' first, then embed that hypothetical document for search — it outperforms searching with the raw query.

#rag #hyde #embedding #vector-search #query-enhancement

ai guide Mar 12, 2026

RAG Personalization: Learning User Preferences from Conversations

After each conversation, asynchronously extract likely user preferences and skill level, then automatically personalize search parameters on the next query — no manual setup required.

#rag #personalization #memory #user-profile #async

ai guide Mar 12, 2026

MMR + Popularity Weighting: Recommendations That Are Both Relevant and Diverse

Ranking purely by relevance leaves you with five documents all describing the same route. MMR strikes a balance between relevance and diversity, and layering in popularity weighting makes results even more useful.

#rag #mmr #diversity #reranking #popularity #recommendation

ai deep-dive Mar 12, 2026

Modular RAG Pipeline: Designing RAG as a Composable DAG

RAG doesn't have to be a rigid three-step process. It's a set of steps that can be dynamically enabled, skipped, or reordered. Pipeline as Code lets the system adapt its behavior without redeployment.

#rag #pipeline #architecture #modular #dag #cloudflare-workers

ai guide Mar 12, 2026

Multi-Query Expansion: Search One Question from Multiple Angles

A single vector search on a complex query often misses relevant documents. Let the LLM rewrite the query into 3-5 sub-queries, run them in parallel, and recall improves significantly.

#rag #multi-query #query-expansion #recall #rrf

ai guide Mar 12, 2026

Multimodal RAG: Bringing Images into the Knowledge Base

Climbing routes carry a ton of visual information (topos, wall photos) that text-only RAG misses entirely. Multimodal RAG makes images searchable and understandable.

#rag #multimodal #vision #image-embedding #clip

ai deep-dive Mar 12, 2026

Three Generations of RAG: From Naive to Modular

Naive RAG works but has real problems. Advanced RAG patches those problems. Modular RAG rearchitects the whole system to be composable and configurable. Understanding all three generations is the key to understanding why modern RAG systems look the way they do.

#rag #naive-rag #advanced-rag #modular-rag #architecture #evolution

ai guide Mar 12, 2026

Plan-and-Execute: A RAG Pattern That Plans Before It Acts

For complex queries, have the LLM map out what information is needed and in how many steps — then execute that plan. More systematic than thinking on the fly.

#rag #plan-execute #agentic #multi-step #reasoning

ai guide Mar 12, 2026

Query Classification: Teaching Your RAG System How to Answer Each Question

Not every question needs full RAG. Classify queries with an LLM first, then route to the right execution path — saving cost and improving accuracy.

#rag #query-classification #adaptive-routing #tool-selection #llm

ai guide Mar 12, 2026

RAG A/B Testing: A Scientific Approach to Comparing Pipeline Configurations

"Adding a Cross-Encoder feels better" is not a scientific evaluation. A/B testing tells you whether a change actually works, how much it helps, and which query types benefit.

#rag #ab-testing #experimentation #metrics #pipeline

ai guide Mar 12, 2026

RAG Cold Start: Building a Useful System When You Have No Data

A RAG system needs data to answer questions, but data only accumulates as the system gets used. Cold-start strategy is what bridges the gap from empty to useful.

#rag #cold-start #bootstrapping #indexing #data

ai guide Mar 12, 2026

RAG Cost Optimization: Minimizing the Cost of Every Query

RAG system costs come from LLM tokens, Embedding APIs, and vector search. Every stage has room for cost reduction, but you need to verify that optimizations don't sacrifice too much quality.

#rag #cost-optimization #performance #token-budget #caching

ai guide Mar 12, 2026

RAG Evaluation Frameworks: How to Use RAGAS, DeepEval, and TruLens

RAG system quality is hard to evaluate by intuition alone. RAGAS, DeepEval, and TruLens provide systematic metric frameworks that pinpoint exactly which component is failing.

#rag #evaluation #ragas #deepeval #trulens #metrics #quality

ai debug RAG 系統實戰 Mar 12, 2026

RAG Common Failure Modes: 10 Problems and Their Solutions

When a RAG system breaks, 90% of the time it's one of these 10 failure modes. Identify which one first, then apply the matching fix — far more effective than optimizing blindly.

#rag #debugging #failure-modes #quality #troubleshooting

ai guide Mar 12, 2026

RAG Guardrails: Adding a Defense Layer to Inputs and Outputs

The attacks RAG systems face go beyond the technical level — Prompt Injection and Jailbreak are real threats. Both inputs and outputs need independent protection layers.

#rag #guardrails #security #prompt-injection #safety #llm

ai guide Mar 12, 2026

RAG Observability Tool Landscape: Choices in 2026

Rolling your own traces is good enough, but open-source tools save you a lot of work. Langfuse, Phoenix, and LangSmith each have their niche — the right choice depends on your trade-offs around self-hosting, open source, and integration complexity.

#rag #observability #langfuse #phoenix #langsmith #tracing #monitoring

ai guide Mar 12, 2026

RAG Observability: 17-Step Tracing to Turn the Black Box Transparent

The hardest part of a RAG system isn't building it — it's figuring out why a particular answer went wrong. Pipeline Tracing records every step's decisions and data so debugging has a clear trail to follow.

#rag #observability #tracing #debugging #pipeline #monitoring

ai guide Mar 12, 2026

RAG Prompt Engineering: How to Design System Prompts and Context

Search found the right documents, but the LLM's answers are still poor — often the problem lies in prompt design. System prompt structure, context formatting, and instruction placement all affect output quality.

#rag #prompt-engineering #system-prompt #context #llm

ai guide Mar 12, 2026

RAG Streaming: Using SSE to Display LLM Responses as They Generate

LLM generation takes 3-5 seconds, and waiting for the full response before displaying it makes for a terrible experience. SSE pushes tokens as they're generated, reducing time-to-first-character from 5 seconds to under 1 second.

#rag #streaming #sse #server-sent-events #cloudflare-workers #ux

ai guide Mar 12, 2026

RAG Quota System: Controlling LLM Costs with Dual Limits

Limiting request count alone is not enough — a single long query can consume ten times the tokens of a normal one. Dual quotas (request count + token count) are what truly control costs.

#rag #quota #rate-limiting #token-budget #cost-control #cloudflare-workers

ai deep-dive Mar 12, 2026

RAG vs Fine-tuning: It's Not Either/Or

RAG and Fine-tuning solve different problems. RAG gives the model new knowledge; Fine-tuning changes the model's behavior and style. In most cases you use both, not pick one.

#rag #fine-tuning #llm #architecture #comparison

ai guide Mar 12, 2026

RRF: How to Merge Multi-Source Results in RAG Systems

BM25, vector search, HyDE, and Multi-Query each produce separate result sets -- how do you merge them sensibly? RRF uses ranks instead of scores, sidestepping the fundamental problem that scores from different systems are incomparable.

#rag #rrf #fusion #ranking #multi-source #retrieval

ai guide Mar 12, 2026

Self-Reflection + LLM-as-Judge: Having AI Evaluate Its Own Answers

Use another LLM to evaluate answer accuracy and quality — if the score is too low, regenerate, and automatically add appropriate disclaimers.

#rag #llm-judge #self-reflection #groundedness #quality-assurance

ai guide Mar 12, 2026

Semantic Caching: Run the RAG Pipeline Only Once for Semantically Similar Queries

Caching doesn't have to match exact query strings -- semantically similar questions can hit the cache too, skipping the entire RAG pipeline execution.

#rag #semantic-cache #caching #vector-search #performance

ai guide Mar 12, 2026

SPLADE: Smarter Sparse Vector Search Beyond BM25

BM25 only recognizes words that appear in the query. SPLADE infers related terms and adds them to the search, gaining partial semantic capability while preserving the precision of keyword search.

#rag #splade #sparse-vector #bm25 #retrieval #hybrid-search

ai guide Mar 12, 2026

Text-to-SQL Router: Precise Queries That Skip RAG

Questions like 'how many routes did I complete this year' will never be answered well by RAG semantic search — querying the database directly is far more accurate. Let the LLM identify intent, extract parameters, and execute predefined SQL templates.

#rag #text-to-sql #sql #query-routing #structured-query

ai guide Mar 12, 2026

Vector Database Selection: How to Choose Between Pinecone, Weaviate, Qdrant, and Vectorize

Vector database selection is more constrained by deployment platform than LLM selection. Determine your platform and scale requirements first, then evaluate features — don't just look at benchmarks.

#rag #vector-database #pinecone #weaviate #qdrant #cloudflare-vectorize