Skip to content

quidproquo

Tech, climbing, surfing, coffee, and everything else.

A personal blog by xiaoxu — covering software engineering, AI agents, climbing, surfing, coffee, and whatever else I'm obsessing over. Written mostly in Traditional Chinese; posts tagged en are in English.

置頂
tech guide

Conversation as Documentation: Turning Debug Sessions into Blog Posts with Claude Code

After finishing a debug session, just say 'write this up as a post' — Claude Code extracts content from the conversation, applies a template, generates frontmatter, and commits it to the repo. No extra writing required.

置頂
product project

Building a Low-Friction Blog from Scratch with Astro + Cloudflare Workers

To consolidate scattered notes and showcase diverse interests, I built a personal blog using Astro + Cloudflare Workers D1, paired with a Claude post skill for zero-friction writing.

置頂
tech guide

What Tools Power This Blog

Astro + the full Cloudflare suite — static-first, edge-computed, zero maintenance cost

investing

Understanding a Trading Post from Scratch: Warrants, Stock Futures, and Maintenance Ratio Explained

Saw a trading post about going from NT$150k to NT$2.4M in half a year. Didn't understand a word of it — warrants, stock futures, maintenance ratio — so I looked them all up.

ai deep-dive

Loop Engineering: When AI No Longer Needs You to Write Prompts

Loop Engineering is the practice of designing systems that automatically prompt AI agents, rather than prompting them manually. Boris Cherny runs hundreds of agents, Addy Osmani coined the term, and Blake Crosley identified verification cost as the real bottleneck — this article covers primary sources, the five building blocks, applicability boundaries, and criticisms.

climbing

A Knowledge Map of Climbing Books: 60+ Books from Training Science to Mental Philosophy

Climbing books don't exist in isolation — they represent competing schools of thought, philosophical differences, and knowledge gaps. This post maps the relationships between 60+ books to help you pick the right one to read next.

tech deep-dive

Choosing a Browser MCP: CDP, Playwright MCP, or Puppeteer MCP?

@playwright/mcp uses an accessibility tree instead of screenshots, cutting token cost by 10–50x — the best default for AI agents doing web automation. Puppeteer MCP fits screenshot-heavy tasks. Direct CDP via MCP is for low-level tooling or domains that Playwright/Puppeteer don't expose.

tech deep-dive

Chrome DevTools MCP: An MCP Server Built on CDP

Chrome DevTools MCP wraps Chrome DevTools Protocol (CDP) as an MCP server, giving AI agents direct access to 40+ CDP Domains including Profiler, HeapProfiler, and Security that Playwright and Puppeteer MCP don't expose — at the cost of having to implement MCP tool definitions and auto-wait logic yourself.

tech deep-dive

@playwright/mcp: Microsoft's Official Browser Automation MCP Server

@playwright/mcp defaults to an accessibility tree (browser_snapshot) instead of screenshots, cutting token consumption by 90%+. Combined with Playwright's native auto-wait, it's the best starting point for AI agents doing web automation.

tech deep-dive

@modelcontextprotocol/server-puppeteer: The Official Puppeteer MCP Server

server-puppeteer is the Puppeteer wrapper in the official MCP servers monorepo — seven lean tools built around screenshots and evaluate. Token cost is significantly higher than @playwright/mcp per interaction, but it fits well when the screenshot itself is the deliverable or custom JS execution is the core need.

investing

I Saw This 2x ETF System on Threads — It Comes From 3 Books

A Threads post by @jj.investnote laid out a 2x leveraged ETF system built on three books: 60% 2x ETF + 40% cash, Beta=1.2 target, ±10% rebalancing trigger, and a crash protocol.

ai deep-dive

Text / Image to Lottie: A Landscape Overview of AI Animation Generation Tools

From the CLI tool kin3o to the CVPR 2026 paper OmniLottie — a survey of open-source approaches for converting text and images into Lottie animations, with performance benchmarks and selection guidance.

tech deep-dive

AI-Powered E2E Testing: How canary, Stagehand, Magnitude, and Shortest Each Solve the Problem

AI agents running tests are non-reproducible; hand-written Playwright is hard to maintain. Four tools that emerged in 2024-2025 each tackle this dilemma with very different design philosophies.

ai deep-dive

The Skill Management Revolution for LLM Agents: A Complete Landscape of Skill Lifecycle from Voyager to MUSE-Autoskill

MUSE-Autoskill (2026) introduces a five-stage skill lifecycle framework. Self-created skills achieve 60.35% (+7.16%) on SkillsBench overall, and an impressive 87.94% on tasks where skill generation succeeds — surpassing the human-authored skill ceiling. This post synthesizes six arXiv papers to map the full landscape of skill evolution research.

ai deep-dive

How to Rigorously Compare Before and After Agent Changes: From Golden Sets to Statistical Testing

Even with temperature=0, LLM outputs can still fluctuate by up to 15% in practice. To rigorously compare agent changes, you need a frozen golden set, at least 3 runs per query averaged out, LLM-as-judge blind evaluation (pairwise preference flip rate reaches 35%), and paired statistical tests -- not just running each version once and going by feel.

ai deep-dive

Agent Observability: From OTel Traces to Catching Hallucinations, Tool Misuse, and Infinite Loops

The industry has converged on using OpenTelemetry GenAI semantic conventions to turn every LLM call and tool call into a span. Detecting the three major failure modes then splits into three tracks: faithfulness + semantic entropy for hallucinations, framework-level symbolic guardrails for tool misuse, and max steps + action hash deduplication for infinite loops — all wired into a Final / Trajectory / Single-step three-layer evaluation framework.

ai deep-dive

Resource Rationality for Agents: Optimal Decisions Across Tokens, Tool Calls, and Latency

Agent decision-making under resource constraints is bounded rationality reborn: Rational Metareasoning uses VOC rewards to save 20-37% of tokens, BATS proves that adding budget without budget awareness is futile, FrugalGPT cascades cut costs by up to 98%, and Speculative Actions reduce latency by 20%. The three constraints ultimately converge into a single Pareto curve, and the overarching trend is moving from humans tuning knobs to models making resource-rational decisions on their own.

ai deep-dive

The Single Crack in Agent Security: From Prompt Injection to Trust Boundaries to Multi-Agent Worms

Three seemingly distinct agent security problems — tool output injection, trust boundaries, malicious agents — share the same root cause: LLMs flatten instructions and data into a single token stream, making them architecturally unable to distinguish between the two. Understand this through-line and you can trace every attack from EchoLeak (CVE-2025-32711, zero-click) to the Morris II AI worm, and see why 'making the model behave' doesn't work — only architectural constraints (six design patterns, CaMeL) do.

ai deep-dive

How Agents Decide Whether to Retrieve, What to Retrieve, and How to Merge: Three Decision Layers of Agentic RAG

Traditional RAG is a fixed pipeline of 'retrieve then answer.' Agentic RAG splits retrieval into three decision layers: when to retrieve (FLARE uses token probabilities; Adaptive-RAG uses a complexity classifier), what to retrieve (HyDE / RAG-Fusion / decomposition / Step-back), and how to fuse (RRF k=60 then cross-encoder rerank then compression -- Anthropic measured a -67% failure rate reduction). Key counter-intuitive insight: unnecessary retrieval hurts quality -- 'deciding not to retrieve' is a first-class capability.

ai deep-dive

Stop Hand-Tuning Prompts: From GEPA to Tool Descriptions, Automating Agent Behavior Optimization

Automatic prompt optimization (APO) has evolved from APE/OPRO to GEPA: replacing sparse rewards with linguistic reflection, winning over GRPO by ~6pp with 4-35x fewer rollouts. Meanwhile, tool descriptions are the overlooked prompt -- small wording changes can shift tool selection rates by 10x, and Anthropic's experiments show Claude self-rewriting tool descriptions outperforms human experts. These two lines are converging: eval-driven automatic optimization is eating hand-tuned prompts.

ai deep-dive

How to Build a Deep Research Agent: Multi-Turn Search Planning, Conflict Resolution, and Verifiable Conclusions

An autonomous research agent = four controllable stages: planning (decompose into sub-questions), retrieval loop (search -> read -> reflect on gaps -> search again), evidence arbitration (>=2 independent sources, typed conflict handling), and verifiable output (sentence-level citations + independent verification pass). Two approaches: training-based uses RL to learn end-to-end when to search (Search-R1 +41%); orchestration-based uses orchestrator-worker division of labor (Anthropic internal eval +90.2%, at ~15x token cost).

ai deep-dive

Machine Theory of Mind: How Agents Infer Other Agents' Intentions, Knowledge, and Goals

Inferring another's beliefs/goals/intentions from observed behavior is called Machine Theory of Mind. Three lineages: symbolic BDI, Bayesian inverse planning, and deep learning ToMnet. The biggest controversy in the LLM era is that GPT-4 still trails humans by >10 points on ToMBench — are high scores genuine reasoning or statistical shortcuts?

ai deep-dive

Multi-Agent Error Propagation and Recovery: Borrowing Thirty Years of Weapons from Distributed Systems

At 99% accuracy per step over 100 steps, the error-free completion rate drops to just 36% -- error compounding is a structural problem, not something prompt tuning can fix. Distributed systems' supervisor trees, bulkheads, circuit breakers, sagas, and durable execution can be mapped almost one-to-one into agent orchestration. But LLMs introduce a failure class that traditional systems never had -- semantic errors that don't crash -- which require Inspector agents (recovering 96.4%) and redundancy voting (MAKER: one million steps with zero errors) to address.

ai deep-dive

Semantic Similarity ≠ Retrieval Relevance: Scenarios, Detection, and Remedies for Systematic Embedding Retrieval Failures

Cosine similarity and relevance systematically diverge across an entire class of scenarios: negation (most IR models score at or below random on NevIR), exact identifiers, numeric thresholds, and logical combinations (SoTA models achieve recall@100 < 20 on LIMIT) -- some of these hit the theoretical ceiling of the single-vector paradigm, and switching to a larger model will not help. Recommended remedy order: hybrid BM25 -> reranker (Anthropic measured -67%) -> upstream metadata routing -> domain fine-tuning / multi-vector.

ai deep-dive

How to Pick the Right Tool from Hundreds: The Collapse Curve of Tool Selection and Engineering Solutions

As tools scale up, selection accuracy doesn't degrade gracefully — it collapses: 4 to 51 tools drops from 43% to 2%, 10 to 100+ drops from 78% to 13.62%. The root fix is to stop stuffing everything in at once — Anthropic's Tool Search Tool uses defer loading plus retrieval to cut 85% of tokens, pushing Opus 4.5 accuracy from 79.5% to 88.1%. Description quality has conditional payoff: negligible in simple scenarios, but correctness jumps from 44% to 50% in multi-tool chaining.

ai deep-dive

A More Expensive Embedding Won't Save Your Traditional Chinese RAG: Three Layers of Failure and the Fix Order

Traditional Chinese RAG retrieval failures are a three-layer stack: embedding granularity defects (BGE/GTE from 0.1B to 7B all mis-rank on simple queries like 'fried chicken'), Simplified Chinese / English corpus dominance causing local vocabulary drift ('premium', 'exclusion clause' alignment is unreliable), and MTEB Chinese benchmarks being Simplified Chinese making model selection signals misleading. The fix is architectural: OpenCC normalization -> hybrid + jieba segmentation -> reranker -> local fine-tuning last -- and the prerequisite for all of it is building a Traditional Chinese eval set first.

ai guide

arXiv Paper Quality Assessment Guide: From Endorsement Mechanisms to a Practical Checklist

arXiv does not perform peer review, and roughly 2% of submissions are rejected. Quality judgment relies on external signals: top venue acceptance > institution + open-source reproduction > citation quality. Includes a 20-item practical checklist and a 2026 toolbox (PWC has shut down).

tech deep-dive

Bumblebee: A Design Teardown of Perplexity's Read-Only Supply Chain Endpoint Scanner

A Go read-only scanner open-sourced by Perplexity in May 2026 (v0.1.1, zero non-stdlib dependencies). It inventories npm/PyPI/Go/RubyGems/Composer/MCP/editor and browser extensions into NDJSON, matches against a custom exposure catalog, and answers the question 'which machines in my fleet are currently affected' the moment a supply chain incident hits. It deliberately never invokes any package manager and is not an EDR.

ai deep-dive

Auto-Embedding on File Upload Is a Bad Default: A Survey of Adaptive / Agentic RAG and Agentic Parsing

Making 'chunk and embed every uploaded file automatically' the default behavior means making a decision for the LLM that it could have made itself. From Self-RAG (2310.11511) and Adaptive-RAG (2403.14403) to AgenticOCR (2602.24134), the academic trajectory is pushing three layers of decision-making -- whether to retrieve, whether to parse, and how to chunk -- from the ingestion pipeline back to the agent at conversation time.

ai deep-dive

Assembling LLM Agent Skills / Tools / Code Interpreter for Real: A Paper Reading Map

The hard part of LLM agents is not building function calling, skills, code interpreter, and document tools individually -- it is assembling them into a system that selects the right tool, writes code when needed, decomposes tasks, verifies results, and resists prompt injection. This post organizes the key papers into six engineering decisions: function calling reliability, tool/skill selection, code-as-action, multi-step planning, skill systems, and safety plus document generation.

tech debug

Mobile Chrome Redirects Back to Login After Sign-In: Debugging an HTTP-to-HTTPS Entry Point Issue

When mobile Chrome keeps redirecting back to the login page after sign-in, the culprit isn't always OAuth or broken frontend state. In this case, the root cause was that the HTTP entry point for app-dev.daodao.so wasn't issuing a 301 redirect to HTTPS, so /auth/me requests sent with an http origin didn't include the auth_token cookie.

ai deep-dive

A2UI (Agent-to-User Interface): Google's Open Protocol for Agents to Ship UI as Data

A2UI is an agent generative UI protocol open-sourced by Google on 2025-12-15: agents send declarative JSON describing UI intent, and clients render it natively using their own component catalog whitelist, layered on top of A2A. It launched at format v0.8 and iterated to v0.9 within three months.

ai deep-dive

browse.sh: Turning What Browser Agents Learn into a Skill Catalog

browse.sh, launched by Browserbase in May 2026, is two things: a browser skill catalog and the Browse CLI. The core thesis: the bottleneck for browser agents isn't reasoning — it's amnesia. By storing learned site-specific workflows as plain-text SKILL.md files, Autobrowse cut Craigslist task costs from ~$0.22 to ~$0.12 by their own metrics. Note: this has nothing to do with the 2018 Browsh text-mode browser.

ai deep-dive

CodeGraph: Local Code Knowledge Graph, and the Truth About 'Walking the Graph to Save Money'

CodeGraph uses tree-sitter to extract a codebase into a local SQLite/FTS5 knowledge graph, letting AI coding agents query the graph instead of scanning files. The official end-to-end benchmark (7 repos, median of 4 runs) averages 35% cost savings and 70% fewer tool calls -- but only if the agent actually walks the graph. Delegating exploration to a file-reading subagent that ignores CodeGraph turns it into pure overhead.

ai deep-dive

How Do People Read arXiv Papers? A Complete Guide to Methods and Tools

Reading papers is two problems stacked together: methodology (Keshav's three-pass method, 5-10 min / 1 hour / 4-5 hours) determines how to read, and tools (arXiv HTML, alphaXiv, NotebookLM, Connected Papers, Zotero) shorten the time for each pass. AI lowers the barrier to understanding; judging correctness always stays with the human.

ai deep-dive

Midscene.js: Betting on Pure Vision for Cross-Platform UI Automation

An MIT-licensed open-source UI automation framework from ByteDance (~13k GitHub stars). UI actions rely solely on feeding screenshots to vision-language models (Qwen3-VL / Doubao / Gemini-3 / UI-TARS), with no DOM parsing. A single JS API works across Web / Android / iOS / desktop, and starting from v1.0, the DOM action mode was removed entirely. The trade-off: each step is slower and more token-expensive.

tech deep-dive

Antigravity CLI: How Google Folded Gemini CLI Into a Unified Terminal Agent Harness

Antigravity CLI is a terminal agent Google announced at I/O on May 19, 2026. Written in Go (versus Gemini CLI's Node.js), its binary is called agy, and it shares the same agent harness as the desktop Antigravity 2.0. It is also Gemini CLI's successor — the personal-tier Gemini CLI service ends on June 18, 2026.

ai deep-dive

How Claude Reads and Writes PDF / DOCX / PPTX: Deconstructing the Three-Layer Architecture of Skills + Sandbox

Claude has no docx_tool or pdf_tool -- it relies on bash + file tools, plus SKILL.md instructions and pre-installed libraries like pdfplumber / python-pptx inside the container, assembling file handling capabilities from three layers.

ai deep-dive

Open Design: The Open-Source Claude Design Alternative Forked in 11 Days

Anthropic shipped Claude Design on 2026-04-17. On 4-28, nexu-io/open-design went public -- same artifact-first loop, Apache-2.0, runs on the 16 coding-agent CLIs you already have. Two weeks from 0.1 to 0.7, 40k+ stars. A paradigm shift that flattens AI design tools from vertical SaaS into a skill bundle.

ai deep-dive

system_prompts_leaks Deep Dive: What Problem Does a 40k-Star AI System Prompt Archive Solve

asgeirtj/system_prompts_leaks collects the raw system prompts of 40+ AI assistants, from GPT-5.5 and Claude Opus 4.7 to Gemini 3.1 Pro, with 40.3k stars, 461 commits, and an MIT license. The value isn't in obtaining secrets -- it's in turning vendors' implicit policies into comparable engineering material. What you should study is the design decisions, not the text itself.

ai deep-dive

Dissecting Anthropic's Founder's Playbook: Four Stages, Three Moats, and One Cowork Compliance Pitfall

Anthropic's 35-page startup handbook released 2026-05-14 reorganizes Idea/MVP/Launch/Scale around agentic AI. The most valuable takeaways are 'the easier it is to build, the more important validation becomes' and treating CLAUDE.md as the first MVP artifact. The part to discount: the Launch chapter puts compliance workstreams on Cowork -- but Anthropic's own docs say Cowork doesn't write audit logs.

tech debug

LLM Agent Tool Descriptions Determine Tool Selection: Three Bug Fixes

Rewriting tool descriptions from soft suggestions to hard rules (whitelist + consequence explanation) eliminated the LLM's incorrect tool selection; adding skip_signal=True fixed vector store double-indexing.

ai

Using AI Agents to Operate Video Generation Tools: A HyperFrames, HeyGen, and Runway Integration Guide

AI agents can operate video generation tools through three approaches — Skills, MCP Connectors, and direct APIs. Choosing the right integration method matters more than choosing the right tool.

ai deep-dive

Code Mode: Moving Tool Definitions from Context into Code

Stop stuffing all your tool descriptions into context at session start. Let the model write code, have the runtime execute it, and let tool definitions enter context only at the import line — Anthropic's GDrive→Salesforce example dropped from ~150K tokens to 2K, and Cloudflare's 2,500-endpoint schema shrank from 1.17M to 1K.

ai deep-dive

The FDE War: Why OpenAI and Anthropic Are Both Copying Palantir's Playbook

MIT research says 95% of enterprise AI pilots yield zero return. OpenAI and Anthropic announced multi-billion-dollar joint ventures in the same week, wholesale adopting the Forward Deployed Engineer model that Palantir has used for over a decade to bring AI into the enterprise battlefield.

ai deep-dive

How Others Use LLMs to Write: Trade-off Notes from Karpathy's LLM-wiki to Multi-Agent Pipelines

A survey of 11 public LLM writing pipelines, distilled into three dominant patterns: multi-agent (researcher -> writer -> critic), Karpathy LLM-wiki (raw + wiki + LLM writes, humans don't), and quality guardrails (technical verifier + never fabricate + brief gate). The Princeton GEO paper (KDD 2024) quantifies the impact: inline citations +28%, adding statistics +33%, quoting source text +41%, keyword stuffing -9%.

ai

OpenAI's Codex Secure Deployment Strategy: Sandboxing, Auto-review, and Enterprise Governance

In May 2026, OpenAI published its internal Codex deployment practices: sandboxes define technical boundaries, approval policies determine when to pause, Auto-review delegates approval decisions to a sub-agent instead of a human, and Managed configuration lets enterprise admins enforce policies top-down. The core philosophy: zero friction for low-risk actions, mandatory review for high-risk ones.

ai

9Router: A Local 3-Tier Fallback Router That Routes Claude Code / Cursor / Cline to 40+ Providers

Spin up a local OpenAI-compatible endpoint at localhost:20128 that automatically routes requests from Claude Code / Cursor / Cline / Codex / Copilot through a Subscription → Cheap → Free 3-tier fallback to 40+ providers. Built-in RTK compresses tool_result (saving 20–40% input tokens), Caveman mode compresses output, OAuth auto-refresh, multi-account round-robin — install with npm install -g 9router and two commands.

ai

Claude, Codex, and Gemini Are All in the Browser Now: Comparing Three AI Agent Approaches in Chrome

Anthropic builds an extension, OpenAI builds its own browser, Google welds AI directly into Chrome — three completely different approaches. Here's a comparison of the current landscape, key differences, and a selection guide.

ai

15 Walls for Building Your Own Auto-Dev Agent: Concrete Lessons from Stripe Minions

Stripe Minions says 'The walls matter more than the model,' but the case studies from four Silicon Valley companies never explained how to actually build those walls. This post breaks down the 15 walls we implemented in the daodao auto-dev agent: what each wall prevents, where the files live, and what the tradeoffs are. Tier 1 is mandatory, Tier 2 strengthens governance, Tier 3 is serious governance.

ai

What Is an Auto-Dev Agent? An Intro to daodao's Automated Development System

A PM checks a task card in Notion → the system syncs it to a GitHub issue → writes a plan → writes code → opens a PR for human review. This post explains what the system does, what it doesn't do, and why it's feasible now — written for people who don't write code.

ai

Step-by-Step: Build a Notion → PR Auto-Dev Agent — A Reproducible Version of the daodao Pipeline

Build a Notion task → GitHub issue → spec PR → code PR auto-dev agent from scratch. Using the daodao case as a template, this guide walks through every step — what to do, what to verify, and how to handle problems. Notion DB schema → bin/ scaffold → two Claude Code routines → cloud env vars → staging tests.

ai

Claude for Financial Services: Dissecting Anthropic's Multi-Agent Reference Implementation

Anthropic open-sourced 12 financial-industry Agents and 11 MCP connectors. The real takeaway isn't the Agents themselves but the layered design of 'one prompt, two runtimes' and 'pure-file extensibility.'

ai

From Plan to PR: Building daodao's Auto-Dev Agent in Practice

5 rounds of consensus to write the plan, then team mode with 5 workers running 12 tasks in parallel — with plenty of pitfalls along the way. Writing it down for my future self and anyone else trying the same thing.

ai deep-dive

DeepSeek-OCR: The 10x Compression Experiment That Turns Long Context into Images

DeepSeek-OCR's paper is titled Contexts Optical Compression -- OCR is just the means; what it actually validates is that 'rendering text as images and feeding them to a VLM' achieves 10x compression at 97% accuracy. This is a qualitative shift for long-context LLM and RAG token costs.

ai

2026 LLM Inference Provider Free Tiers & Pricing: 40+ Services Ranked by Tier

For side projects, toy demos, and RAG prototypes, nobody wants to swipe a credit card on day one. This is a verified roundup of 40+ LLM inference providers still operating as of 2026/05, tiered by whether free resources auto-replenish or are one-time grants. Each entry notes credit-card requirements, supported models, paid starting prices, and catches. Chinese-origin providers including Zhipu GLM (permanently free), Doubao (2M tokens/day), Kimi, DashScope, and the Ollama local option are all included.

Claude Code /loop: Turning AI into a Background Worker with Native Scheduling (v2.1.72+)

/loop is Claude Code's native cron feature — set schedules in plain English and let Claude monitor, auto-fix PRs, and run recurring tasks in the background. Session-scoped and expires after 7 days; for cross-session scheduling, use Routines or Desktop scheduled tasks.

Claude Code Routines: Complete Guide to Cloud Automation — Setup, Triggers, and Real-World Examples

Routines is Claude Code's cloud automation system (formerly Cloud Scheduled Tasks). Beyond cron scheduling, you can trigger runs via API endpoint or GitHub events — scan issues, review PRs, run checks, open PRs — all while your computer is off.

ai deep-dive

Claude Skills: Package Domain Knowledge into a Folder, Teach Once and It Remembers

A Skill is a folder with a SKILL.md. Three-layer progressive disclosure lets Claude load details only when needed, eliminating the need to re-explain preferences every conversation.

ai

Local Deep Research Walkthrough: A Privacy-First Deep Research Agent

Local Deep Research is a privacy-first deep research agent built on LangChain + LangGraph, integrating 20+ search engines and 30+ research strategies. Its flagship langgraph_agent_strategy takes the LLM-autonomous tool-calling approach, offering a fundamentally different paradigm from fixed-pipeline RAG graphs.

PageIndex: RAG Without Vectors — Turning Long Documents Into a Book With a Table of Contents

PageIndex skips chunking, embedding, and vector storage entirely. Instead it relies on LLM reasoning over a tree-structured table of contents the LLM itself wrote, achieving 98.7% on FinanceBench (GPT-4o reading directly scores only 31%). It solves a different problem than vector RAG — finding the right section in a well-structured long document.

tech deep-dive

Accessing Your Home Mac From Anywhere: Cloudflare Tunnel and the Alternatives in 2026

Two answers stand out for remotely accessing your home Mac in 2026: Cloudflare Tunnel if you need browser-based access with no client install, and Tailscale if you just want something simple for personal use. This post compares both, covers ZeroTier, Pangolin, NetBird, and other alternatives, and explains why Cloudflare's remotely-managed tunnel makes setup significantly easier in 2026.

ai

Search MCP Tools for AI Agents: What to Do When WebFetch / WebSearch Gets Blocked

When using AI agents like Claude Code or Cursor, built-in WebFetch / WebSearch often gets blocked by Cloudflare, geo-restrictions, or rate limits. Connecting a search MCP server is the most direct fix. This post compares the options actually available in 2026.

ai

Groq Console: The Developer Platform for Running Open-Source Models on LPU Inference

Groq Console is the developer portal for Groq's in-house LPU chip, offering an OpenAI-compatible API, Playground, and free tier credits. Its selling point is running open-source models like Llama, Qwen, and DeepSeek at the fastest tokens/second on the market.

tech

Warp: From Modern Terminal to Agentic Development Environment

Warp evolved from a Rust-powered modern terminal into an AI Agent-integrated development environment (ADE), open-sourced under AGPL in April 2026, with over 700,000 developer users.

ai deep-dive

goose: Open-Source, Cross-Platform, LLM-Agnostic Local AI Agent

goose is an open-source AI Agent maintained by the Linux Foundation's AAIF, supporting 15+ LLM providers and 70+ MCP extensions, built with Rust as a Desktop App + CLI + API. It positions itself as a vendor-neutral, self-hostable alternative to Claude Code.

ai guide

Gemma on Cloudflare Workers AI: A Pragmatic Choice for Traditional Chinese Applications

For running LLMs on Cloudflare Workers AI, gemma-3-12b-it follows Traditional Chinese instructions noticeably better than llama-3.1-8b-instruct. With Gemma 4 arriving in 2026, you get Vision, Function calling, and 256K context -- upgrade as needed.

ai project

Qwen (Tongyi Qianwen): Alibaba's Open-Source LLM Family, from 72B to 397B — A Complete Evolution Overview

Qwen (Tongyi Qianwen) is Alibaba's open-source LLM family, known for its Apache 2.0 license, 201-language coverage, and rapid iteration. The latest Qwen3.6 (2026/04) focuses on Agentic Coding — the 27B Dense version achieves 77.2% on SWE-bench and 59.3% on Terminal-Bench 2.0, on par with Claude Opus. A new Thinking Preservation feature lets agents retain reasoning context across turns.

ai

Knowledge Management with LLMs: From Karpathy's llm-wiki to the Open-Source Ecosystem

Karpathy proposed the llm-wiki pattern in 2026, having LLMs proactively maintain a markdown wiki instead of running RAG from scratch every time. Over 100 open-source implementations now exist, ranging from local CLI tools to serverless Telegram bots.

ai

OpenAI Workspace Agents: From Custom GPTs to a Team Automation Platform

On 2026/4/22 OpenAI launched Workspace Agents — powered by Codex, capable of long-running cloud execution, and integrating with Slack/Salesforce/Google Drive. They are the enterprise successor to Custom GPTs.

ai guide RAG 系統實戰

Building a Legal Contract RAG in 36 Hours: Weaviate Query Agent + ColQwen Architecture Breakdown

Using Weaviate Query Agent + ColQwen multi-vector model, a single prompt built a production-grade legal contract search system in 36 hours -- this post breaks down its architecture logic, technology choices, and what you actually need to watch out for.

marketing guide

AKIRAXCLAW's Content Model: 5 Posts a Day, a Three-Tier Funnel, and Agent-Assisted Publishing

Akira runs a Threads → Blog → Docs three-tier funnel with agent-assisted publishing, building a sustainable knowledge monetization model in the Chinese-language AI content market.

ai guide

Where AI Code Review Stands Now: Lessons from Cloudflare's Multi-Agent System

Cloudflare ran a Multi-Agent Code Review system internally for 30 days — 131K reviews, median 3 minutes. This post breaks down their architecture and compares it with solutions from Anthropic, GitHub, CodeRabbit, Greptile, and others.

ai guide

Inside the Codex Agent Loop: How OpenAI Keeps AI Agents Iterating

A detailed look at OpenAI's Codex agent loop design: how prompts are constructed, how multi-turn conversations are managed, how prompt caching prevents cost explosions, and how context window auto-compaction works.

ai guide

Codex App Server: How OpenAI Turned an Agent Harness into a Universal Protocol

OpenAI wrapped the Codex harness as a JSON-RPC over stdio App Server, enabling VS Code, JetBrains, Web, and desktop apps to share a single agent loop. Three core primitives: Item, Turn, and Thread.

ai guide AI Agent 實戰

OpenAI Wrote 1 Million Lines of Code with Codex: Harness Engineering in Practice

An OpenAI internal team spent 5 months with 3 people and 0 lines of hand-written code, delivering a complete product using Codex. This article distills their core lessons on AGENTS.md design, repo-local knowledge bases, architecture enforcement, and entropy management.

marketing project

AEO / GEO Tool Landscape: Input, Traffic, and Output Layers — From isitagentready to aeo-radar to Profound

AEO/GEO tools aren't a single category — they span three distinct layers: the input layer (is your website ready for AI to read), the traffic layer (how much are AI bots actually crawling), and the output layer (how is your brand mentioned in AI answers). This post maps out all three layers, from open-source self-hosted options to commercial SaaS.

tech project

DeerFlow: ByteDance's Open-Source Super Agent Harness for Long-Running Research Tasks

DeerFlow is ByteDance's open-source Super Agent Harness built on Python 3.12 + LangGraph. It orchestrates long-running tasks through sandboxes, long-term memory, sub-agents, skills, and a messaging gateway. It hit #1 on GitHub Trending in February 2026, now surpassing 63,000 stars, with support for Telegram/Slack/Feishu, Claude Code integration, and multiple search backends.

travel guide

2026 Travel Inconvenience Insurance Guide: New Rules, Coverage Comparison, and Where to Buy

2026/4/1 new rules: max 2 policies per trip (different insurers), flat-rate payout cap lowered to NT$6,000. Covers six key areas including flight delays and lost luggage, with a breakdown of where to buy.

ai guide AI Agent 實戰

Agentic Engineering: Making AI Agents Collaborate Like a Real Engineering Team

Agentic Engineering isn't about making AI write code faster — it's about making software move through the entire delivery pipeline faster, by using multi-agent collaboration to compress cross-team coordination friction.

ai guide AI Agent 實戰

The Memory Problem in Agentic Engineering: Types, Implementation, and Ownership

Agent memory isn't a plugin — it's part of the harness itself. Pick the right memory type, estimate data volume, then decide on the technology. And finally, figure out whether you actually own that memory.

ai guide

Multi-Engine Code Review with Codex + Gemini + Claude: Principles, Patterns, and Implementation

AI models rationalize their own code when reviewing it. Using three different CLIs for independent review effectively catches blind spots -- this post covers the design philosophy and practical workflow patterns behind the approach.

tech guide

How Does the YouTube to NotebookLM Extension Work? Reverse Engineering and Cross-Tab Architecture Dissected

NotebookLM has no official API. This extension works by combining three techniques: reverse-engineered Google batchexecute RPC calls, DOM scraping, and cross-tab message passing.

tech debug

Local AI Backend API Always Returns Empty Data: Cookie Domain Isolation

The main backend runs on a remote HTTPS server, so the auth_token cookie is scoped to that domain. The browser never sends it to the local AI backend, causing the API to treat every request as unauthenticated.

ai guide

Integrating AI Agents into Your Development Workflow: A Five-Phase SDLC Breakdown

Agentic AI is not just autocomplete — it is an AI system capable of autonomously executing multi-step tasks. This article breaks down the five phases of the SDLC, explaining where to plug in agents at each phase, how to progress from CLI tools to full-pipeline automation, and the most valuable external resources to track right now.

ai guide

A Book Written by AI Itself, Teaching You How to Build Software with AI

Encyclopedia of Agentic Coding Patterns catalogues 190 patterns to help you make the right software decisions in the age of AI-written code — and the book itself is autonomously written and maintained by an AI agent.

ai guide

GitHub Copilot Coding Agent: Assign an Issue to AI and Let It Open the PR

GitHub Copilot Coding Agent lets you assign an Issue to Copilot, which then automatically creates a branch, writes code, runs CI, and opens a PR — all inside a cloud sandbox. The key to success is setting up AGENTS.md; without it, the agent tends to go off track. Best suited for well-defined medium-sized tasks; requires Pro+ (1,500 premium requests/month) or Enterprise plan.

ai guide

knowledge-pipeline: A Six-Layer Pipeline for RAG Quality Control

A six-layer deterministic pipeline that handles everything from URL ingestion to vector embedding automatically, filtering out garbage before it enters your RAG system through an eight-dimension scoring system.

ai guide

MarkItDown: Convert Any File to Markdown Before Feeding It to an LLM

A lightweight open-source tool from Microsoft that converts PDF, Office, images, audio, and more into Markdown — purpose-built for LLM pipelines.

ai guide

MCP vs CLI vs API: The Real Boundaries of Agent Tool Interfaces

MCP is not going away, but its effective scope is narrower than most people think. For local development, CLI and raw API almost always beat MCP. MCP's truly irreplaceable niche is the narrow gap of 'cross-agent shared local tool layer.'

marketing guide

Is Your JSON-LD Invisible to AI Search Engines? A Pipeline Breakdown and AEO/GEO Strategy

Different AI engines process web pages in vastly different ways. Some only read the body; others rely on pre-built indexes. JSON-LD and schema markup are not universally effective — body content quality and structure are the only cross-platform foundations that hold.

product project

quidproquo Blog Improvement Roadmap: Content, Technical Debt, RAG Design, and Harness Infrastructure

Using my own 30+ RAG/Agent posts to audit the blog itself, I identified a prioritized improvement list spanning content quality, site tech, RAG design fixes, harness infrastructure, and AI agent applications — no phases, just priorities.

ai guide

Lessons from the Trenches: What AI Native Teams Must Get Right

Not everyone should use a coding agent to modify code directly. AI Native teams need interface specs, test-first development, monorepo, security guardrails, human-in-the-loop, and token budget controls. Building an agent platform layer on top of coding agents and clearly redefining developer roles is the right path forward.

ai guide

Autoreason: Teaching LLMs When to Stop Self-Refining

Autoreason replaces the traditional critique-and-revise loop with a competitive multi-version evaluation mechanism (A/B/AB + blind Borda count), solving three structural problems in LLM self-refinement: prompt bias, scope creep, and lack of restraint.

ai project

Vercel Open Agents: Moving the Coding Agent from Your Laptop to the Cloud

An open-source coding agent reference implementation from Vercel Labs. A three-layer architecture separates the web UI, agent workflow, and sandbox VM — designed as a starting point for teams that want to self-host their own Claude Code or Cursor Background Agent.

tech guide

The Full Picture of Cloudflare Workers AI Binding: It's More Than Just run()

env.AI is not just run(). It also exposes toMarkdown (document-to-Markdown conversion), autorag (managed RAG), gateway (external provider proxy), and models (metadata lookup). Understanding these four method groups is what unlocks Cloudflare as a full AI platform inside Workers.

ai guide

Claude Octopus: The Consensus Plugin That Hooks 8 Models Into Claude Code Simultaneously

Claude Octopus is a Claude Code plugin that simultaneously calls Codex, Gemini, Copilot, Qwen, Ollama, Perplexity, OpenRouter, and Claude to review the same code, using a 75% consensus threshold to catch single-model blind spots. It ships with 32 personas, 48 /octo:* slash commands, 51 skills, and a Dark Factory fully autonomous spec-to-code pipeline.

ai guide

LLM Council: Karpathy's Weekend Multi-Model Parliament — Three Stages of LLM Peer Review

LLM Council is a local Web App Andrej Karpathy built over a weekend. It sends one question to multiple LLMs simultaneously, has them anonymously peer-review each other, and then a Chairman model synthesizes a final answer. Positioned as a small tool for comparing models while studying — 99% vibe coded with no plans for long-term maintenance — but the architecture itself is a minimal ensemble LLM implementation worth studying.

tech guide

Better Agent Terminal: Consolidate Multiple Project Terminals and Claude Code Agents into One Window

Better Agent Terminal (BAT) is an Electron desktop app that unifies multiple project workspaces, terminals, and Claude Code Agents into a single window — solving the everyday pain of exploding iTerm tabs and the lack of a proper GUI container for agents. MIT License, available on macOS, Windows, and Linux.

ai guide

Claude Managed Agents: Letting Anthropic Handle the Agent Shell and Sandbox

Claude Managed Agents is a beta service launched by Anthropic on 2026/04/08 that provides an agent harness plus cloud container sandbox, billed per token plus $0.08/session-hour. It suits long-running async tasks and is worth exploring if you don't want to build your own agent loop and sandbox.

ai guide

Agent Skills: A Skill Framework That Makes AI Agents Work Like Senior Engineers

Agent Skills is Addy Osmani's open-source collection of 19 production-grade engineering skills that drive AI agents to follow senior engineering discipline through /spec → /plan → /build → /test → /review → /ship commands, instead of cutting corners.

ai guide

Graphify: Turn Code and Documents into a Queryable Knowledge Graph

Graphify uses tree-sitter AST to extract code structure, then applies LLM semantic analysis to documents and images, compressing an entire project into a queryable knowledge graph. It claims to save 71.5x tokens per query compared to reading raw files.

ai project

Claw Code: An Open-Source CLI Agent That Rewrites Claude Code in Rust

Claw Code is a from-scratch Rust rewrite of the Claude Code CLI, featuring 48K lines of code, 40 tools, and MIT licensing. Most remarkably, the entire project was built by multiple AI agents collaborating over just 5 days, surpassing 170K GitHub stars within a week of launch.

ai guide

clawhip: An Event Notification Router That Keeps Multi-Agent Development Under Control

clawhip is a Rust daemon that routes AI coding agent events (commits, PRs, session status) to Discord / Slack, solving the observability problem of not knowing who is doing what when multiple agents run in parallel.

ai guide

Hermes Agent: Nous Research's Self-Improving AI Agent

Hermes Agent is an open-source self-improving AI agent by Nous Research, featuring persistent memory, skill learning, 40+ tools, multi-platform gateways, support for 200+ model providers, and serving as the official successor to OpenClaw.

ai guide

notebooklm-py: An Unofficial Python API for Google NotebookLM

notebooklm-py reverse-engineers Google's batchexecute RPC protocol, letting you programmatically control NotebookLM via Python / CLI / AI Agent — including audio, video, slides, quiz generation and more.

ai guide

oh-my-claudecode: An Enhancement Layer That Turns Claude Code into a Multi-Agent Collaboration Platform

oh-my-claudecode (OMC) adds 8 collaboration modes, 19 specialized agents, and cross-model orchestration (Claude + Codex + Gemini) on top of Claude Code, transforming a single-user CLI tool into a multi-agent development platform. Features include Deep Interview for requirement clarification, Smart Model Routing that saves 30-50% on tokens, and automatic rate limit recovery.

ai guide

oh-my-codex: A Structured Workflow Enhancement Layer on Top of OpenAI Codex CLI

oh-my-codex (OMX) doesn't replace Codex CLI — it adds a structured workflow layer on top of it. From requirements clarification and plan generation to multi-agent parallel execution, four core Skills transform scattered prompt conversations into a trackable development process.

ai guide

oh-my-openagent: A Multi-Model Agent Team Framework That Replaces Single-LLM Coding

oh-my-openagent (OmO) transforms OpenCode from a single-LLM tool into a multi-model agent team — Opus as the workhorse, GPT-5.2 as the architect, Gemini for frontend, Sonnet for documentation lookup — all triggered to run in parallel with a single ultrawork keyword. With 48K stars, it is the earliest project in the UltraWorkers ecosystem to establish the multi-agent coding pattern.

ai project

OpenHarness: A Fully Open-Source Agent Harness Framework

An open-source Agent Harness framework from HKUDS (HKU Data Science Lab) that implements tool calling, skill loading, memory, permissions, and multi-agent collaboration as complete infrastructure, supporting Anthropic / OpenAI / GitHub Copilot API formats.

tech guide

Solving Duplicate Config Files for Codex and Claude Code with a Symlink

Claude Code only reads CLAUDE.md; Codex only reads AGENTS.md. Teams using both end up maintaining two identical files. Fix: make CLAUDE.md a symlink pointing to AGENTS.md — one source of truth.

ai guide

How to Use Claude Code Agent Teams? Design Patterns from 6,400+ Agents on GitHub

There are already 6,400+ .claude/agents/*.md files on GitHub. We dissected 4 representative projects — ChemistryTimes (content production pipeline), claude-sub-agent (document-driven development pipeline), agentic (Temporal.io DAG parallel execution), and vs-copilot-multi-agent (hook-enforced memory persistence) — plus ruflo's enterprise-grade swarm architecture, distilling 6 design patterns and 5 practical trends.

ai guide AI Agent 實戰

From Stripe to Meta: How Silicon Valley's Top Companies Replace Keyboards with AI Agents

Top Silicon Valley companies are independently building internal AI coding agents that automate everything from a Slack message to a merged PR. This article deep-dives into architectures from Stripe, Ramp, Coinbase, and Spotify, then expands to cover Google, Meta, Amazon, Uber, Goldman Sachs, Walmart, and more.

ai guide

Three Modes of LLM Knowledge Bases: Knowledge Vault, Experience Vault, and Blog

Andrej Karpathy proposed a framework for compiling personal knowledge wikis with LLMs — collect raw data, have the LLM compile it into .md wiki pages, run Q&A against the wiki, and file outputs back. This post compares three practical approaches: Karpathy's knowledge vault model, the community's experience vault model, and quidproquo's blog model.

ai guide

AI Agent Caching Goes Beyond One Layer: From Claude Code's 18 Cache Types to Multi-Layer ReAct Agent Design

After dissecting Claude Code's 18+ caching mechanisms, I found that you can't touch provider-level prompt cache, but embedding cache, tool result cache, and entity cache are not only within your reach — they deliver even better results. Includes a complete AgentCache interface design and per-tool TTL strategy.

ai guide

AI Agent Tool Descriptions Shouldn't Be Static: Dynamic prompt() Design Learned from Claude Code

Every one of Claude Code's 45 tools uses a prompt() method that dynamically adjusts based on user type, feature flags, and system capabilities. Applying this pattern to a ReAct Agent, tool descriptions are dynamically generated along three dimensions: orchestrator model capability, locale, and available tools. Small models automatically get few-shot examples; large models save tokens.

ai guide

Claude Code Complete Breakdown: The Deep Reasoning King of Terminal Agents

From $20/mo Pro to $200/mo Max 20x, Claude Code's Opus 4.6 delivers the strongest reasoning depth in the industry, and its Max plan's unlimited pricing saves heavy users over 90% compared to API costs.

ai guide

Cursor CLI Complete Analysis: The All-Rounder Extending IDE Agent to the Terminal

Cursor CLI brings the IDE Agent into the terminal, supporting interactive TUI and headless modes, Plan/Ask/Agent three modes, Cloud Handoff, CI/CD integration, $20-200/mo.

ai guide

Gemini CLI Complete Analysis: The Terminal Agent with the Most Generous Free Tier in the Industry

Gemini CLI will be discontinued on 2026/06/18, with Antigravity CLI as the official successor. Before shutdown: free 60 req/min, 1,000 req/day, including Gemini 2.5 Pro and 1M token context window. Skills, Hooks, and Subagents can all be migrated.

ai guide

Kiro (AWS) Complete Analysis: The Spec-Driven Agentic IDE

Kiro's free plan includes 50 credits. Auto mode intelligently mixes models to save costs. Spec-Driven development upgrades vibe coding into traceable, structured workflows. Agent Hooks enable local CI/CD automation.

ai guide

OpenAI Codex Complete Plan Analysis: Agent Integration in the ChatGPT Ecosystem

Codex is tied to ChatGPT subscriptions ($20-200/mo). GPT-5.4 + mini automatic routing is the highlight, and the CLI supports dual billing via Plan mode and API Key mode.

ai project

OpenCode Full Analysis: An Open-Source Terminal Agent Supporting 75+ Model Providers

OpenCode is a free, open-source CLI agent written in Go with 95K+ GitHub stars. It supports 75+ model providers including local Ollama, allows authentication via Copilot/ChatGPT accounts, and lets you switch models mid-session without losing context.

ai guide

Agent CLI Subscription Plans Compared: Building a Flexible Multi-Model Routing Strategy

Comparing six major Agent CLI subscription plans in 2026 (Claude Code, Cursor CLI, Codex, Kiro, Gemini CLI, OpenCode), and exploring multi-model routing patterns — routing simple tasks to cheaper models and complex tasks to flagship models, with real-world savings of 40-85%.

ai guide

2026 Personal AI Hardware Buying Guide: DGX Spark, Mac Studio, MSI AI Edge Compared

Comparing the NVIDIA DGX Spark, Apple Mac Studio M4 Ultra, ASUS Ascent GX10, MSI AI Edge, and more — helping you find the right local inference hardware.

ai guide

Multi-Model Routing Open-Source Tools & Implementation: Getting the Right Model for the Right Job

With multi-model routing, 70% of simple tasks are directed to cheap models, and only 10-15% of complex tasks use flagship models — saving 40-85% on inference costs in practice. This article covers the architecture and implementation of five major open-source tools.

product project

Digital Ecosystem Research: Dissecting Platform Integration Strategies from LINE and Shopify to Taiwan MarTech

A breakdown of the three-layer digital ecosystem structure: LINE's super-app, Shopify App Store flywheel, and Taiwan MarTech integration strategies. The core mechanism is using APIs and data flows to create mutual dependency among participants, collectively reinforcing the moat.

tech guide

Where Should AI Agent Global Skills Live? The Division of Labor Between .claude, Codex Skills, and AGENTS.md

Skill paths are almost always runtime-specific. AGENTS.md is the reliable way to share rules across agents. Put personal reusable capabilities in each agent's supported global directory; put project workflows inside the repo.

tech guide

code-review-graph: Using a Knowledge Graph to Cut AI Code Review Token Usage by 8x

code-review-graph uses Tree-sitter to parse your codebase and build a persistent knowledge graph, tracks the blast radius of changes, and feeds only truly relevant context to the AI — claiming an average 8.2x reduction in token usage.

tech guide

GitBook: A Documentation Platform That Turns Docs into a Product

GitBook is a Git-based documentation platform with Markdown editing, version control, and multi-user collaboration. Ideal for technical docs, API references, and internal knowledge bases. The free plan is sufficient for individuals and small teams.

tech guide

NVIDIA DGX Spark: A Desktop AI Supercomputer That Fits a Petaflop on Your Desk

The NVIDIA DGX Spark is powered by the GB10 Grace Blackwell Superchip, 128 GB of unified memory, and delivers 1 petaFLOP of FP4 compute — starting at around $3,999 USD. It lets developers run 200B-parameter models locally and fine-tune 70B models, making it the most accessible NVIDIA AI development platform available today.

tech guide

Documentation Platform Guide: GitBook, Docusaurus, Mintlify, and Seven Other Options

A breakdown of nine major documentation platforms — their positioning, pros, cons, and ideal use cases. Decision logic: open-source projects → Docusaurus/VitePress, API docs → Mintlify/ReadMe, internal enterprise → Confluence, fastest to launch → GitBook.

ai guide

The Complete Guide to Agent CLIs: Design Logic, Tool Comparison, and Best Practices

Agent CLIs are not smarter autocomplete tools -- they are AI agents that can read your codebase, execute multi-step tasks, and operate in real environments. Claude Code, Codex CLI, Gemini CLI, OpenCode, Aider, Pi, Kiro, Amp, Cursor CLI... the tools keep multiplying, but they all share a common set of design principles -- understanding these principles is how you actually get good at using them.

ai guide

15 Agent Frameworks Worth Watching in 2026

Sorted by GitHub Stars, a survey of 15 mainstream AI Agent frameworks in 2026 — their positioning, key features, and ideal use cases. Not a ranking — it's a map.

ai guide

One Sentence to an IG Carousel — From 3 Hours Manual Work to a Fully Automated Pipeline

Use Claude Code as an orchestrator to chain Playwright screenshots, catbox.moe image hosting, Meta Graph API publishing, and Telegram notifications — generate and publish an IG carousel from a single sentence.

ai guide

llama.cpp — From Pure C++ to an LLM Inference Engine on Consumer Hardware

llama.cpp is the most widely used local LLM inference engine, implemented in pure C/C++. It supports CPU, Metal, CUDA, Vulkan, and other backends, and uses the GGUF quantization format to run multi-billion-parameter models on consumer hardware.

ai guide

TurboQuant+ — Two-Stage Quantization to Compress KV Cache to 2-bit, Running 100B Models on a MacBook

TurboQuant+ is an open-source implementation of a Google Research ICLR 2026 paper that uses PolarQuant + QJL two-stage quantization to compress the KV cache by 3.8-6.4x, enabling consumer hardware to run larger models with longer contexts.

ai guide

Small Models That Run on Phones: Choices and Constraints in 2026

The main on-device LLMs in 2026 are Gemma 3n, Qwen 3.5 Small, Llama 3.2, Phi-4-mini, Ministral 3, and SmolLM3. Sub-3B quantized models can hit 30-50 tokens/sec on phones with 8GB RAM, but RAM, thermal throttling, and context window remain hard constraints.

ai project

2026 Q1 Open-Source LLM Landscape: From Frontier Models to On-Device, a Complete Survey

2026 Q1 saw a full-blown open-source model explosion: on the LLM front, GLM-5, Kimi K2.5, and Qwen3.5 caught up with closed-source models; Embedding and Reranker are dominated by Qwen3 and BGE; speech has Voxtral TTS and Whisper V3; image has FLUX.2; and video has Wan 2.2 rivaling Sora. This is the complete navigation map.

tech guide

Claude Code: A Complete Guide to Anthropic's Terminal AI Coding Agent

Claude Code is Anthropic's agentic coding tool that runs in the terminal, IDEs, Slack, GitHub, and on the web. Its core extension system has six layers: CLAUDE.md (persistent context), Skills (on-demand workflows), Hooks (deterministic automation), Subagents (isolated delegation), MCP (external tool connections), and Agent Teams (multi-agent collaboration).

tech project

Codex CLI: A Complete Guide to OpenAI's Open-Source Terminal Coding Agent

Codex CLI is OpenAI's open-source terminal coding agent built in Rust. It supports MCP, subagents, image input, and code review. Paired with the codex-1 (o3-optimized) or GPT-5-Codex model, it can read, write, and execute code directly on your local machine.

tech project

Gemini CLI: A Complete Guide to Google's Open-Source Terminal AI Agent

Gemini CLI is Google's open-source terminal AI agent (Apache 2.0). ⚠️ Announced end-of-service on 2026/06/18 — official migration path is Antigravity CLI. Free accounts get 60 requests/minute and 1,000 requests/day; Skills, Hooks, and Subagents all carry over.

tech project

OpenCode: A Complete Guide to the Open-Source AI Terminal Coding Agent

OpenCode is an open-source AI coding agent built in Go (95K+ GitHub stars) with a built-in TUI, support for 75+ LLMs, LSP integration, Vim-style editing, and SQLite session management. Free, no subscription required — works with local or cloud models.

tech project

Pi Coding Agent: A Minimalist Open-Source Terminal Coding Harness

Pi is a minimalist coding agent built in TypeScript by Mario Zechner, featuring just 4 core tools (read, write, edit, bash) and a 300-word system prompt. It's extensible via Extensions, Skills, and Prompt Templates, runs on the Bun runtime, and ships with built-in Ollama support via `ollama launch pi`.

ai guide

AI-Ready Content: The Complete Guide to Making Your Website an AI-Readable Data Source

In 2025-2026, websites need to be readable not just by humans but by AI. From llms.txt and Schema Markup to GEO and RAG ingestion pipelines, this post maps out the complete technical landscape for turning your website into an AI-consumable data source.

ai guide AI Agent 實戰

Advanced Harness Engineering Patterns: Tool Registry, Guard System, and Checkpoint-Resume

A Harness is more than just an LLM wrapper. Tool Registry manages dynamic tool loading and selection, Guard System establishes a four-layer defense network, and Checkpoint-Resume enables long-running tasks to survive interruptions. These three patterns form the critical infrastructure of production-grade Agent systems.

ai guide

Skill vs Subagent: Comparing Two Agent Collaboration Modes in Claude Code

A Skill is a prompt template you invoke manually. A Subagent is an independent agent that Claude routes to automatically. They look similar, but differ completely in trigger mechanism, tool isolation, and context management.

ai guide

Ticketing Is Dead — Review Is the New Planning

When AI agents can turn intent into a PR in minutes, the bottleneck in software engineering flips from 'planning what to do' to 'evaluating whether the output is correct.' Artifacts of the ticketing era — sprints, story points, backlog grooming — are collapsing to zero, replaced by review as the core practice.

Claude Code Spinner Verbs: The Complete List of 185 Status Verbs Extracted from Source Code

When processing requests, Claude Code randomly displays one of 185 built-in verbs (like Thinking, Brewing, Clauding), then picks one of 8 completion verbs with elapsed time. You can customize these via spinnerVerbs in settings.json, using either replace or append mode. All data in this post is verified directly from cli.js source code.

tech guide

gstack — Garry Tan's 20 Skills That Turn Claude Code into a Virtual Engineering Team

gstack is Garry Tan's open-source Claude Code skills toolkit. Its 20 specialized skills transform a solo developer into an entire engineering team — automating everything from product planning and design review to code review, QA, and deployment.

ai guide AI Agent 實戰

Anthropic's Harness Design: Making AI Agents Work Like Engineers

The same model produces dramatically different results under different harness designs. Anthropic uses a dual-agent architecture, cross-session state files, and a GAN-inspired generator-evaluator loop to let Claude autonomously complete hours-long software development tasks.

ai guide

Google's Eight Multi-Agent Design Patterns

Google outlined eight multi-agent design patterns: from the simplest Sequential Pipeline to the composable Composite Pattern. More complexity isn't always better — picking the right pattern matters more than stacking agents.

ai guide AI Agent 實戰

From Prompt to Harness: The Three Evolutions of AI Engineering

AI engineering has gone through three phases: Prompt Engineering (write better instructions) → Context Engineering (feed the right information) → Harness Engineering (design the entire working environment). Each evolution doesn't replace the previous one — it operates at a higher level of abstraction.

ai guide

OpenClaw Agent Loop: Execution Cycle, Streaming & Queue

A single agent execution: receive message → assemble context → model inference → tool execution → stream response → persist. Each session runs serially, with 5 queue modes supported.

ai guide

OpenClaw Agent Runtime: Workspace, System Prompt, and Bootstrap

Every OpenClaw agent has its own 'home' (Workspace), with personality and behavior defined by bootstrap files like AGENTS.md and SOUL.md. The System Prompt is dynamically assembled each time.

ai guide

OpenClaw Access Control: Authentication, Secrets, and OAuth

API Key is the most stable option; OAuth uses PKCE + token sink pattern; SecretRef supports env/file/exec sources; Trusted Proxy delegates authentication to a reverse proxy.

ai guide

OpenClaw Automation (Part 1): Cron, Heartbeat, and Webhook

Heartbeat for periodic checks (30-minute batches), Cron for precise scheduling (with isolated sessions and model overrides), Webhook for receiving external event triggers.

ai guide

OpenClaw Automation (Part 2): Standing Orders — Permanent Directives

Standing Orders grant an agent permanent authorization to execute defined programs — with explicit scope, triggers, approval gates, and escalation rules, paired with Cron for time-based control.

ai guide

OpenClaw Enterprise Channels: Slack, Teams, Google Chat & Matrix

Slack has the most complete enterprise features (native streaming, slash commands). Teams requires Azure Bot setup. Matrix supports E2EE encryption.

ai guide

OpenClaw Primary Channels: WhatsApp, Telegram, Discord

WhatsApp uses QR pairing + Baileys, Telegram is the fastest to set up with a Bot Token, and Discord supports guild/thread/button interactive components.

ai guide

OpenClaw Other Channels: Signal, iMessage, LINE, IRC, Nostr, and More

Signal uses signal-cli for privacy, iMessage is best via BlueBubbles, LINE uses webhooks, IRC/Nostr/Twitch each have their own character.

ai guide

OpenClaw Channels Overview: Pairing, Groups, and Routing

OpenClaw supports 24+ channels running simultaneously, using Pairing to control who can chat, Group Policy to control group behavior, and Routing to decide which agent receives messages.

ai guide

OpenClaw Gateway Part 1: Configuration System and Hot Reload

openclaw.json uses JSON5 format with strict schema validation, supporting hybrid hot reload — safe changes apply instantly while critical changes trigger automatic restarts.

ai guide

OpenClaw Gateway (Part 2): Remote Access, Tailscale, and Multi-Gateway

Gateway binds to loopback by default. Use SSH tunnel or Tailscale Serve/Funnel for remote access; multiple Gateways can distribute load.

ai guide

OpenClaw Installation Guide (Part 2): Cloud Platforms, K8s & VPS Deployment

OpenClaw supports deployment to 9 cloud platforms, K8s, and Ansible automated provisioning — you can run a 24/7 Gateway for as little as $5/month.

ai guide

OpenClaw Installation Guide (Part 1): npm, Docker, Nix & Local Deployment

OpenClaw offers 6 local installation methods: installer script, npm, Docker, Podman, Nix, and Bun, plus Raspberry Pi deployment and building from source.

ai guide

OpenClaw Model Advanced: Failover, Prompt Caching, and Token Billing

OpenClaw has built-in two-stage fault tolerance with Auth rotation + Model Fallback, plus Prompt Caching for cost savings and comprehensive Token tracking.

ai guide

OpenClaw's Model Requirements and Provider Ecosystem

OpenClaw supports 35+ model providers. The minimum requirement is that the model supports tool use + streaming. It has built-in auth rotation and model failover mechanisms.

ai guide

OpenClaw Additional Providers: DeepSeek, Groq, Ollama, OpenRouter, Bedrock...

Beyond the big three (Anthropic/OpenAI/Google), OpenClaw supports 30+ providers — from DeepSeek to local Ollama and everything in between.

ai guide

OpenClaw Multi-Agent and Delegate Architecture

OpenClaw supports running multiple isolated agents within a single Gateway, routing messages via bindings, and enabling AI to act on your behalf through its Delegate architecture.

ai guide

OpenClaw Nodes Deep Dive: Mobile Devices and Remote Hosts

Nodes are peripheral devices for the Gateway -- iOS/Android provide camera/location/notifications, macOS provides Canvas/system.run, and Node Host enables remote exec on other machines.

ai guide

OpenClaw Documentation Guide: 200+ Docs — Where Do You Start?

OpenClaw has 200+ docs. This article helps you see the big picture, understand what each section covers, and decide where to start based on your role.

ai deep-dive

OpenClaw Reference: Pi Integration & Configuration Reference

Pi is OpenClaw's embedded coding agent runtime; OpenClaw is Pi's Gateway shell. This configuration reference covers 16 top-level sections and 335 documents.

ai guide

OpenClaw Desktop Platforms: macOS, Linux, and Windows

OpenClaw has a menu bar app on macOS, runs as a systemd service on Linux, and recommends WSL2 on Windows. Here are the differences and considerations across all three platforms.

ai guide

OpenClaw Mobile Platforms: iOS and Android

OpenClaw's iOS and Android apps are not Gateways — they are Nodes, turning your phone's camera, screen, location, and voice into sensory extensions for AI agents.

ai guide

OpenClaw Plugin System: Architecture and Development Guide

Plugins are built with TypeScript ESM and support 12 capability registrations (channels, models, tools, TTS, images, etc.), published to ClawHub or npm.

ai guide

OpenClaw Sandbox Mechanism: Docker, SSH, and OpenShell

OpenClaw's sandbox has three layers of control: Sandbox determines where code runs (Docker/SSH/OpenShell), Tool Policy determines which tools are available, and Elevated is the host escape hatch for exec.

ai guide

OpenClaw Session, Memory, and Compaction

OpenClaw sessions support 4 DM isolation levels, Memory is stored as Markdown files, and Compaction automatically summarizes and compresses when context is nearly full.

ai guide

OpenClaw Threat Model: MITRE ATLAS Security Analysis and Formal Verification

OpenClaw uses the MITRE ATLAS framework to analyze AI system threats, identifying three Critical risks (prompt injection, malicious skills, credential theft), and employs TLA+ formal verification for security properties.

ai guide

OpenClaw Tools (Part 1): Browser Control and Web Search

OpenClaw's browser uses managed profiles for isolation, supports remote CDP (Browserless/Browserbase), and Deep Research combines search and browsing for multi-step research.

ai guide

OpenClaw Tools (Part 3): Exec Tool, Thinking Levels, and Slash Commands

Exec supports foreground/background/PTY execution with three security levels (deny/allowlist/full). Thinking has 7 levels (off to adaptive). Slash Commands come in two types: commands and directives.

ai guide

OpenClaw Tools (Part 2): Skills System and Sub-Agents

Skills are AgentSkills-compatible SKILL.md folders with a 6-tier loading priority. ClawHub is the public marketplace. Sub-agents can nest up to 5 levels deep.

ai guide

OpenClaw Tools (Part 4): TTS, PDF, Lobster, and MCP

TTS supports three providers — ElevenLabs, Microsoft, and OpenAI. PDF has native and extraction modes. Lobster is a deterministic workflow runtime. MCP enables external tool integration.

ai debug

OpenClaw Operations: Troubleshooting and Diagnostics

openclaw doctor is the all-in-one diagnostic tool, openclaw sandbox explain troubleshoots sandbox issues, and openclaw channels status --probe checks channel connectivity.

ai guide

OpenClaw UI: Control UI, TUI, and Web Chat

Control UI is a browser dashboard (http://127.0.0.1:18789), TUI is a terminal interactive interface, and Web Chat is a WebSocket real-time chat.

ai guide

Phil Schmid: Why Agent Harness Is the Most Important Thing in 2026

The model is the CPU, the harness is the operating system, and the agent is the application. No matter how powerful a model is, without a good harness it's just a demo. Phil Schmid argues that harness is the most critical infrastructure in AI engineering for 2026.

tech guide

Complete Guide to Bypassing Cloudflare Anti-Bot for AI Agents: From Debugging to Building an MCP Server

Standard Playwright gets blocked by Cloudflare. Both playwright-extra + stealth and nodriver can bypass it. The final step is wrapping the solution into an MCP server so AI agents can use it automatically.

tech guide

"Recommend the next route" and "Recommend something similar" are not the same thing — Intent Disambiguation in RAG Recommendation Systems

In a climbing RAG system, 'recommend the next route' (progression) and 'recommend a similar route' (similarity) were conflated by a single hasSimilarRouteIntent() function, causing recommendation quality to collapse. The fix is a two-stage intent classification with a Regex Fast Path + LLM Fallback.

tech guide

RAG Multi-Entity Queries: When the User Lists Five Routes and the System Only Sees the First

The RAG system's extractRouteReference() used a for...return pattern that grabbed only the first match — so when a user provided five completed routes, only one was used. The fix evolves through three layers: rule-based multi-entity extraction, user profile aggregation, and embedding centroid.

tech deep-dive

When Vector Search Matches by Name Instead of Grade: Attribute Conflation in RAG Systems

Query: 'I just sent Beauty in the Mirror 5.11b — recommend routes of similar difficulty.' The results came back full of routes with similar-sounding names, not similar grades. Root cause: dense embeddings compress multiple attributes into a single vector, and the rarity of the route name drowns out the grade signal. The fix: three layers of defense — metadata pre-filtering, query rewriting, and score fusion.

ai guide

LangGraph: Managing Agent Workflows with Graph Structures

LangGraph models LLM workflows as directed graphs, solving the pain points of multi-turn iteration, conditional branching, and parallel execution that are difficult to handle with linear pipelines.

product project

Product Builder: As AI Enables Anyone to Build from 0 to 1, Product Roles Are Being Reorganized

AI enables a single person to run the full loop from problem discovery to design to build. Product Builders influence outcomes not through authority over a team, but by directly shipping usable products. From LinkedIn and Walmart to startups, this role is being established everywhere.

tech guide

Biome: Replacing ESLint + Prettier with Rust

Biome does the work of ESLint + Prettier in a single tool, running 10–20x faster with far less configuration. DaoDao uses it across an entire monorepo — lint and format in one pass.

tech guide

AEO Guide: Answer Engine Optimization — Getting AI Search Engines to Cite Your Content

AEO (Answer Engine Optimization) is a content strategy aimed at AI search engines like Perplexity, ChatGPT Search, and Google AI Overview. The core idea is to make your content the easiest source for AI to cite — not just another link in the results page.

tech guide

A Complete Guide to Blog SEO — From Meta Tags to Structured Data

SEO is more than keywords. Structured data (JSON-LD), Open Graph, hreflang, and robots.txt are the technical optimizations that actually help search engines understand your content. This guide walks through a complete implementation using an Astro blog as the example.

tech guide

BullMQ: The Most Mature Redis-Backed Job Queue for Node.js

BullMQ is the most mature job queue in the Node.js ecosystem, backed by Redis, with support for priorities, retries, scheduling, and delayed jobs. DaoDao uses it to handle notification delivery and practice auto-completion scheduling.

tech guide

Celery: The Standard Distributed Task Queue for Python

Celery is Python's go-to distributed task queue, using Redis or RabbitMQ as a broker to offload long-running work to the background. DaoDao's AI service uses it to handle async tasks like LLM feedback generation.

Claude Code Global Skills Not Found in New Sessions? Understanding Skill Discovery and How to Debug It

Global skills live in ~/.claude/skills/, but they go missing in new sessions or the Desktop App? The problem usually isn't a missing file — it's that the skill descriptions aren't being loaded into context. This post clarifies the CLI vs Desktop App differences, the role of settings.json, and the most reliable fix.

tech guide

ClickHouse: When PostgreSQL Analytics Queries Start Slowing Down, You Need OLAP

ClickHouse is a column-oriented OLAP database that scans hundreds of millions of rows in seconds. DaoDao uses it to record user behavior events for the AI recommendation engine's feature engineering, letting PostgreSQL focus on transactional data.

tech guide

Cloudflare D1: SQLite Relational Database at the Edge

D1 is Cloudflare's serverless SQLite database that binds directly to Workers, supports full SQL (JOINs, transactions), and handles automatic backups. It's well-suited for small-to-medium relational data needs — NobodyClimb uses it as its primary database.

tech guide

Cloudflare KV: A Global Edge Key-Value Store

KV is Cloudflare's globally distributed key-value store. Reads are served from the nearest edge node with extremely low latency. It's ideal for caching, feature flags, and ephemeral data — but writes are eventually consistent.

tech guide

Cloudflare R2: An S3 Alternative with Zero Egress Fees

R2 is Cloudflare's object storage service — S3-compatible API, zero egress fees, and native Workers binding. Stop worrying about bandwidth bills for media-heavy applications.

tech guide

Cloudflare Workers: Not Lambda, Not Containers — It's V8 Isolates

Cloudflare Workers uses V8 Isolates instead of containers — no cold starts, global edge deployment, and direct access to D1, R2, KV, and AI via Bindings. Great for APIs, SSR, and lightweight backends; not suited for long-running tasks.

tech guide

Docker in Practice: Containerizing from Development to Deployment

Docker lets you bundle your application together with its environment, eliminating the 'works on my machine' problem. Combined with multi-stage builds and Compose, it's an essential tool for modern backend deployment.

tech guide

Expo + React Native: What It's Actually Like to Ship One Codebase for iOS and Android

Expo turns React Native development from 'environment setup hell' into a state where you can just start writing logic. Expo Router brings file-based routing that dramatically lowers the barrier for web developers making the switch. Both DaoDao and NobodyClimb use it to ship across iOS and Android.

tech guide

Express.js: The Default Answer for Node.js Backends, and Why It Still Makes Sense

Express is the most mature Web framework for Node.js, with a rich middleware ecosystem and abundant learning resources. Paired with TypeScript and a clear layered architecture, it remains a justifiable choice in 2026.

tech guide

FastAPI: The Go-To Framework for Python AI Services

FastAPI is a modern Python web framework built on type hints — it auto-generates OpenAPI docs, supports native async, and delivers performance close to Node.js. It's the top choice for AI/ML services and the most worthwhile framework to learn in the Python backend ecosystem.

tech guide

GitHub Actions: A CI/CD Primer and Monorepo Strategy

GitHub Actions is the lowest-friction CI/CD tool available today, ideal for small-to-medium projects. The key to monorepos is using path filters so only affected apps trigger a build.

tech guide

Hono: The Lightweight Web Framework Built for Edge Runtimes

Hono is a web framework designed specifically for edge runtimes like Cloudflare Workers, Deno, and Bun. It's an order of magnitude lighter than Express, natively supports Web Standard APIs, and is the go-to choice for edge environments.

tech guide

Next.js 15 + App Router: What Server Components and use cache Actually Do

Next.js 15 + React 19's App Router shifts rendering responsibility from the client to the server. use cache ties caching logic directly to data functions instead of scattering it across fetch options. Both DaoDao and NobodyClimb chose this stack for very practical reasons.

tech guide

@opennextjs/cloudflare: Running Next.js on Cloudflare Workers

@opennextjs/cloudflare enables Next.js 15 App Router deployments on Cloudflare Workers — dynamic SSR runs in a Worker, static assets are served from Cloudflare Assets. Zero server management, but with clear feature limitations.

tech guide

PM2: The Practical Choice for Node.js Process Management

PM2 keeps your Node.js app running on a server — auto-restarts on crash, supports cluster mode to max out CPU cores, and handles log management. Nearly every Node.js app deployed on a VM or VPS needs it.

tech guide

Prisma ORM: Type-Safe Database Access for TypeScript Projects

Prisma's schema-first design gives you versioned migrations, full TypeScript types on every query, and intuitive relation handling. The tradeoff is a learning curve and the inherent limits of any ORM abstraction — but for most TypeScript projects, it's a worthwhile deal.

tech guide

React Hook Form + Zod: The Best Combo for Form Handling

React Hook Form handles form performance, Zod defines the validation schema — together they eliminate nearly all form boilerplate. Share a single Zod schema across a monorepo and you get one source of truth for both frontend and backend validation.

tech guide

Redis Essentials: Caching, Sessions, and Pub/Sub in One Go

Redis is an in-memory key-value store that's blazingly fast. DaoDao uses it to handle three responsibilities at once — API caching, session storage, and BullMQ job queues — all from a single Redis instance.

tech guide

shadcn/ui: Not a Package — It's Copy-Pasted Component Source Code

shadcn/ui is not an npm package — it copies component source code directly into your project, giving you full ownership. DaoDao uses it to build packages/ui, a shared component library used across three Next.js apps.

tech guide

TailwindCSS: Utility-First Is a CSS Management Strategy, Not Just a Style Preference

TailwindCSS's core value is solving CSS's global namespace pollution and dead code problems. Utility classes keep styles co-located with components, and unused classes are automatically purged at build time — production CSS bundles typically come in at just a few dozen KB. Both DaoDao and NobodyClimb use it for web styling.

tech guide

Tamagui: A React Native UI Framework — Why NobodyClimb Chose It Over NativeWind

Tamagui is a UI framework built for React Native with a complete design token system, theme support, and compile-time optimization that moves style computation to build time. NobodyClimb chose it over NativeWind primarily because its cross-platform token system is more robust.

tech guide

TanStack Query: The Standard Solution for Server State

Managing API data with useState + useEffect means reinventing the wheel — and doing it worse. TanStack Query handles caching, background updates, and loading/error states so you can focus on UI logic.

tech guide

Turborepo + pnpm Workspaces: The Standard Approach to Monorepos

Turborepo solves monorepo build speed problems; pnpm workspaces solves dependency sharing. Together they are the best choice for JS/TS monorepos today.

tech guide

Zod: Runtime Type Validation for TypeScript

TypeScript types only exist at compile time — they vanish at runtime. Zod lets you validate external data at runtime while inferring TypeScript types from the same schema. One definition, two jobs done.

tech guide

Zustand: The Lightest Global State Management for React

No Provider, no reducer — global state in just a few lines. NobodyClimb uses it for auth and UI state, paired with TanStack Query for server state.

A One-Person Full-Stack Team: AI-Driven Development Workflow from OpenSpec to Auto-Deploy

Use OpenSpec to break requirements into engineering tasks, Claude Code to implement them, hooks to auto-format and protect, local review before committing, three AI reviewers running in parallel on PR, and auto-deploy after merge. This entire workflow lets one person maintain quality across six sub-projects.

Claude Code Hooks: A Complete Guide to Event-Driven AI Control

Hooks are Claude Code's event system. They trigger shell commands, HTTP requests, or LLM evaluations automatically before/after tool execution, when a prompt is submitted, or when a task ends. Use them to block dangerous operations, run automated reviews, inject context, or write audit logs.

Claude Code Skills: A Complete Guide to Turning Repetitive Workflows into Single Commands

A Skill is an SOP written for AI. Define the steps in a Markdown file and Claude follows them. No coding required, no frameworks to learn — just write down what an experienced person would do.

Turning Debug Sessions into GitHub Issues with a Claude Code Skill: Designing /file-bug-issue

Stuck mid-debug and can't fix it right now? Use /file-bug-issue to package the error analysis, reproduction steps, and attempted fixes from your conversation into a well-structured GitHub issue. Pair it with a Remote Agent to let AI automatically take over the fix.

Let AI Pick Up Issues, Write Code, and Open PRs: Hands-Off Development with Claude Code Remote Agent

Using Claude Code's Scheduled Remote Agent, automatically scan GitHub issues every 2 hours, implement features, open PRs, and address review feedback — no human intervention required. Humans only write issues and click merge. Pair it with the custom /publish-tasks skill to push OpenSpec engineering tasks directly to GitHub issues.

ai project

GLM-5: Zhipu AI's 744B Open-Source Model Trained Entirely on Huawei Chips

GLM-5 is a 744B MoE open-source model released by Zhipu AI (Z.ai) in February 2026, trained entirely on Huawei Ascend chips and released under the MIT license. It currently ranks as the top open-source model, surpassing Claude and GPT-5 on benchmarks like Humanity's Last Exam, while its API pricing is 1/5 to 1/8 of theirs.

ai project

Kimi: How Moonshot AI's Long-Context Model Challenges GPT and Claude

Kimi is a large language model from Chinese AI startup Moonshot AI, known for its ultra-long context window, open-source strategy, and highly competitive pricing. From 200K context in 2023 to K2.5 Agent Swarm in 2026, Kimi has become a force that the global AI market cannot ignore.

ai guide

Langfuse Complete Guide: LLM Application Observability from Scratch

Langfuse is currently the most mature open-source LLM Observability platform. This post covers four core capabilities — Tracing, Prompt Management, Evaluation, and Datasets — showing you how to use them in real projects.

Claude Code's Three-Layer Quality Defense: Hooks, Skills, and Instruction Files

Hooks are automated safety nets (blocking bad commits), Skills are interactive workflows (running checks + auto-fixing), and instruction files (CLAUDE.md / AGENTS.md) are behavioral guidelines. Each layer operates independently, but together they enable an AI agent to automatically run lint, typecheck, and build checks before every commit.

tech guide

How to Classify Code Review Comments? From Conventional Comments to AI Review Tool Taxonomies

Three main classification systems dominate: Conventional Comments (label-based), Google's severity prefixes (Nit/Optional/FYI), and SonarQube's four quadrants (Bug/Vulnerability/Code Smell/Hotspot). AI review tools have each developed their own taxonomies, but the core dimensions consistently converge on four areas: correctness, security, performance, and maintainability.

ai guide AI Agent 實戰

Context Engineering: Why Your AI Agent's Problem Is Information, Not the Model

Context Engineering is the core concept that replaced Prompt Engineering in 2025: the focus shifted from 'how to ask' to 'what information to provide.' Delivering the right information at the right time into the context window is more effective than upgrading to a stronger model. This post covers the definition, four key strategies, practical techniques, and common failure modes.

tech guide

From Mock to Real AI: Integrating Cloudflare Workers AI into action-maker

Upgraded action-maker from hardcoded mock data to live Cloudflare Workers AI generation. The architecture splits into Worker (AI only), Server (data storage), and Frontend (orchestration). Hit two gotchas along the way: Qwen3's thinking block and the Workers AI response format.

ai guide

MCP (Model Context Protocol): The Standardized Protocol for AI Agent Tool Invocation

Every AI tool has its own calling format, making integration costly. MCP (Model Context Protocol) is an open standard proposed by Anthropic that unifies the communication protocol between AI Agents and external tools/data sources, enabling tools to be reused across Agents.

ai guide

Claude Certified Architect Foundations Exam Complete Guide

A complete study guide for Claude's official architect certification: five exam domains, six scenario types, common anti-patterns, and hands-on preparation strategies.

tech guide

False positives in Node.js image vulnerability scans? Separate app packages from npm built-ins first

When reviewing vulnerability scan results for a Node.js Docker image, you can't just look at package names. First distinguish between project dependencies and the packages bundled with npm inside the base image — otherwise you'll fix the wrong thing.

tech guide

What Is Vulnerability Scanning? A Quick Intro to Docker and Package Scanning with Trivy

Vulnerability scanning isn't just about generating reports — it helps you discover known risks in your system before they become incidents. This post uses Trivy as a hands-on example to explain what scanners actually look for, how to read the results, and how to get started.

tech guide

Turning a Scraper Script into an MCP Server for Claude to Use Directly

Wrap a local Python script into an MCP Server using FastMCP so Claude Code can call it directly — no more manually running pipelines.

tech debug

MCP Tool Returns 1M Characters: The Token Explosion in search_local_jobs

The MCP tool was returning a description field that caused 1,033 job listings to exceed the token limit. The fix: exclude description by default and add pagination.

ai guide

Agent Memory Systems: From RAG to Read-Write Memory Evolution

RAG is read-only. Agent Memory lets AI not only read but also write and persist information. Three memory types: Procedural (behavior patterns), Episodic (temporal events), and Semantic (factual knowledge) form a complete cognitive memory system.

ai deep-dive

Complete Guide to AI Agent Architecture Patterns: From Three Pillars to Multi-Agent Systematic Navigation

AI Agent is not a single technology -- it is an entire architecture system. This article is a systematic navigation: starting from the Agent Three Pillars (Context/Cognition/Action), through the three-stage evolution of AI engineering (Prompt -> Context -> Harness), to eight Multi-Agent design patterns and production-grade Harness infrastructure. Each topic links to a dedicated deep-dive article.

ai guide

The Three Core Pillars of AI Agents: Context, Cognition, Action

An AI agent is not a black box — it is built from three layers: what it knows (Context), how it thinks (Cognition), and what it can do (Action). Understanding these three layers is the key to grasping why agents are sometimes brilliant and sometimes go off the rails, and how to design a truly effective agent system.

tech guide

docker restart Does Not Re-apply Volumes — Debugging a Bind Mount Failure

docker restart does not recreate the container, so changes to volumes in docker-compose.yml only take effect after running docker-compose down && up.

ai guide RAG 系統實戰

Multi-Agent RAG: Distributed Retrieval Architecture with Specialized Agent Collaboration

A single RAG Agent handling all queries hits knowledge boundaries and performance bottlenecks. Multi-Agent RAG dispatches retrieval tasks to multiple specialized Agents, each with its own knowledge base and retrieval strategy, coordinated by a central Orchestrator that merges results.

Claude Code Permission Modes Explained: Five Modes from Default to Auto

Claude Code has five permission modes: default (confirm each step), acceptEdits (auto-accept edits), plan (read-only planning), auto (background AI classifier review), and bypassPermissions (YOLO, skip everything). Switch with Shift+Tab or configure via settings.json. Auto mode is the sweet spot — no step-by-step confirmations, but with safety guardrails.

tech guide

nginx 502: Debugging Cross-Compose Container DNS Resolution

Service names aren't resolvable across Compose projects — you need to add a network alias so nginx can find the container.

tech guide

Installing and Verifying Superpowers for GitHub Copilot CLI: Implementation, Diagnostics, and Validation

A hands-on log of installing Superpowers (packaged by DwainTR) for Copilot CLI on a local machine — including the diagnostic process when skills didn't appear after installation, the fix, and practical tips.

tech guide

Docker DNS Resolution: container_name vs network alias

Cross-project DNS resolution requires container_name or a network alias — and only aliases support horizontal scaling.

ai guide

LongRAG: Rethinking RAG Chunking Strategy with Long-Context Models

Traditional RAG splits documents into small chunks for retrieval, but this causes information fragmentation. LongRAG leverages 100K+ token long-context models to retrieve larger document segments (entire sections or even whole documents), reducing fragmentation while maintaining retrieval efficiency.

ai guide

Speculative RAG: Small Models Draft in Parallel, Large Model Verifies at Once

Speculative RAG uses small specialist models to generate multiple answer drafts from different document subsets in parallel, then a large model verifies and selects the best answer in one pass. Accuracy improves up to 12.97%, latency drops up to 50.83%.

tech guide

nginx Restarted Fine, but Cloudflare Keeps Returning 502 — Even Though the Origin Is Healthy

A brief error during nginx restart caused Cloudflare to mark the origin as unhealthy and stop forwarding requests, returning 502 on its own. The key clues: localhost hits to the origin return 200, and nginx access logs are completely empty. Just wait for Cloudflare to automatically re-check the origin — it recovers on its own.

tech guide

Managing Multi-Service Reverse Proxy with nginx conf.d: A Daodao Case Study

A monolithic nginx.conf becomes unwieldy as services grow. Splitting it into per-service files under conf.d/ via include is the standard solution.

tech guide

nginx First Request Always 502, All Subsequent Requests Fine

When nginx uses the `set $variable` pattern for dynamic upstreams, the DNS cache expires every 30 seconds — the first request after expiry hits a 502 because no IP is available. Upgrading to nginx 1.27.3 and switching to an upstream block with the resolve parameter fixes this: DNS updates happen asynchronously in the background.

tech guide

Downloading Files from a VPS Using SSH Config Aliases

Once SSH config is set up, scp works directly with aliases — no need to type out the full IP every time

ai guide

The Complete Ollama Guide: Run LLMs Locally with One Command

Ollama wraps llama.cpp in a Docker-style CLI + REST API, letting you run LLMs locally with a single command. This post covers core concepts, installation, API, hardware requirements, Modelfile customization, and what this tool is — and isn't — good for.

ai guide RAG 系統實戰

The Complete Guide to RAG System Patterns: A Ten-Generation Evolution from Naive to Multi-Agent with Practical Navigation

RAG has evolved far beyond simple 'search + generate' into a technology ecosystem spanning ten generations. This article is a systematic navigation guide: from Naive RAG to Multi-Agent RAG across ten generations, covering retrieval strategies, chunking, embedding, reranking, evaluation frameworks, observability, and cost optimization. Each topic has a dedicated deep-dive article.

ai guide

vLLM — From PagedAttention to a Production-Grade LLM Inference Engine

vLLM uses PagedAttention to eliminate KV cache memory waste, combining continuous batching and prefix caching to become the most widely adopted open-source LLM inference engine today.

tech guide

Ghostty vs cmux: A Guide to Choosing Your Modern Terminal

Ghostty is a fast, native, general-purpose terminal emulator. cmux is a terminal built on top of Ghostty, specifically designed for AI coding agents. They're not competitors — they operate at different layers.

ai guide

Complete Chatbot Development Guide: State Management, Memory Strategies, and Tech Stack Selection

Building a chatbot is more than just calling an API. Conversation state management, memory mechanisms, streaming, guardrails, observability, and tech stack selection — every layer affects the user experience.

ai guide

Prompt Engineering in Practice: Iteration Methodology, Common Mistakes, and Few-shot Optimization

Good prompts aren't written in one go — they're iterated into existence. Start with the simplest prompt, test with real cases, classify error types, and make targeted fixes. This article covers the three-part System Prompt structure, reasoning framework selection, few-shot optimization, token budget management, and six common mistakes.

tech guide

Cloudflare Free Plan Maintenance Page: Custom Error Pages Unavailable, Use a Worker Instead

Cloudflare Custom Error Pages require a paid plan. On the Free Plan, use a Worker with inline HTML to intercept 5xx responses instead.

tech guide

Managing Personal and Work GitHub Accounts with Git Conditional Includes

Use includeIf + SSH Host aliases to let Git automatically switch accounts based on directory path — no more manual switching.

tech debug

Astro + Cloudflare Workers: Native Modules Break the Build Even on Prerendered Routes

Even when a route has prerender = true, Cloudflare Workers' Rollup bundler still attempts to bundle native modules, causing the build to fail. The fix is to move any native module work into a postbuild script.

tech debug

Astro Scoped CSS Not Applied to MDX-Rendered Content

Astro scoped CSS appends a scope hash to each selector, but elements rendered by <Content /> don't receive that hash — causing all prose styles to silently break.

ai guide

Agentic RAG: Letting the LLM Decide When to Search Again

For complex multi-hop questions, a single RAG search isn't enough. Agentic RAG lets the LLM evaluate whether retrieved results are sufficient — if not, it rewrites the query and searches again, forming a ReAct loop.

ai guide

BGE-M3: Why This Embedding Model Works Well for Traditional Chinese RAG

Your choice of embedding model directly determines RAG search quality. BGE-M3's multilingual training, 1024-dimensional vectors, and matching Reranker make it a practical pick for Traditional Chinese RAG.

ai guide

Chunking Strategies: How You Split Text Determines Whether RAG Can Find the Answer

Chunks too large and retrieval loses precision; too small and you lose context. Chunking is the most underrated part of RAG — pick the wrong strategy and no amount of downstream optimization will save you.

ai guide

ColBERT: The Third Way in Vector Search

Bi-Encoders are too coarse, Cross-Encoders are too slow — ColBERT's Late Interaction finds the sweet spot: token-level comparison between query and document, but with document vectors that can be precomputed.

ai guide

Contextual Retrieval: Giving Every Chunk Its "What This Is About" Context

When you split a document into chunks, each chunk loses its place in the original document. Contextual Retrieval solves the isolated-chunk problem by injecting a document-level summary into every chunk at index time.

ai guide

CRAG: Automatically Relaxing Filters When Retrieval Comes Up Empty

Filters too strict and getting zero results? CRAG automatically relaxes them and retries — far better than letting the LLM hallucinate an answer from general knowledge.

ai guide

Cross-Encoder Reranking: Surfacing the Most Relevant Documents

Vector search similarity scores don't equal relevance. Cross-Encoders use pairwise comparison to reorder results and push the truly relevant documents to the top.

ai guide

GraphRAG: Structuring Knowledge as a Graph for Relationship-Based Reasoning

Vector search finds similarity; graph search traverses relationships. When a question requires reasoning across multiple entities — crag → route → sender → grade distribution — GraphRAG outperforms standard RAG.

ai guide RAG 系統實戰

Hybrid Search: Using BM25 + Vector Search to Cover Each Other's Blind Spots

Vector search handles semantics; BM25 handles keywords. Combining them with RRF is what lets you handle both fuzzy queries and exact terms at the same time.

ai guide

HyDE: Boosting Vector Search Recall with Hypothetical Answers

Have an LLM generate an 'ideal answer' first, then embed that hypothetical document for search — it outperforms searching with the raw query.

ai guide

RAG Personalization: Learning User Preferences from Conversations

After each conversation, asynchronously extract likely user preferences and skill level, then automatically personalize search parameters on the next query — no manual setup required.

ai guide

MMR + Popularity Weighting: Recommendations That Are Both Relevant and Diverse

Ranking purely by relevance leaves you with five documents all describing the same route. MMR strikes a balance between relevance and diversity, and layering in popularity weighting makes results even more useful.

ai deep-dive

Modular RAG Pipeline: Designing RAG as a Composable DAG

RAG doesn't have to be a rigid three-step process. It's a set of steps that can be dynamically enabled, skipped, or reordered. Pipeline as Code lets the system adapt its behavior without redeployment.

ai guide

Multi-Query Expansion: Search One Question from Multiple Angles

A single vector search on a complex query often misses relevant documents. Let the LLM rewrite the query into 3-5 sub-queries, run them in parallel, and recall improves significantly.

ai guide

Multimodal RAG: Bringing Images into the Knowledge Base

Climbing routes carry a ton of visual information (topos, wall photos) that text-only RAG misses entirely. Multimodal RAG makes images searchable and understandable.

ai deep-dive

Three Generations of RAG: From Naive to Modular

Naive RAG works but has real problems. Advanced RAG patches those problems. Modular RAG rearchitects the whole system to be composable and configurable. Understanding all three generations is the key to understanding why modern RAG systems look the way they do.

ai guide

Plan-and-Execute: A RAG Pattern That Plans Before It Acts

For complex queries, have the LLM map out what information is needed and in how many steps — then execute that plan. More systematic than thinking on the fly.

ai guide

Query Classification: Teaching Your RAG System How to Answer Each Question

Not every question needs full RAG. Classify queries with an LLM first, then route to the right execution path — saving cost and improving accuracy.

ai guide

RAG A/B Testing: A Scientific Approach to Comparing Pipeline Configurations

"Adding a Cross-Encoder feels better" is not a scientific evaluation. A/B testing tells you whether a change actually works, how much it helps, and which query types benefit.

ai guide

RAG Cold Start: Building a Useful System When You Have No Data

A RAG system needs data to answer questions, but data only accumulates as the system gets used. Cold-start strategy is what bridges the gap from empty to useful.

ai guide

RAG Cost Optimization: Minimizing the Cost of Every Query

RAG system costs come from LLM tokens, Embedding APIs, and vector search. Every stage has room for cost reduction, but you need to verify that optimizations don't sacrifice too much quality.

ai guide

RAG Evaluation Frameworks: How to Use RAGAS, DeepEval, and TruLens

RAG system quality is hard to evaluate by intuition alone. RAGAS, DeepEval, and TruLens provide systematic metric frameworks that pinpoint exactly which component is failing.

ai debug RAG 系統實戰

RAG Common Failure Modes: 10 Problems and Their Solutions

When a RAG system breaks, 90% of the time it's one of these 10 failure modes. Identify which one first, then apply the matching fix — far more effective than optimizing blindly.

ai guide

RAG Guardrails: Adding a Defense Layer to Inputs and Outputs

The attacks RAG systems face go beyond the technical level — Prompt Injection and Jailbreak are real threats. Both inputs and outputs need independent protection layers.

ai guide

RAG Observability Tool Landscape: Choices in 2026

Rolling your own traces is good enough, but open-source tools save you a lot of work. Langfuse, Phoenix, and LangSmith each have their niche — the right choice depends on your trade-offs around self-hosting, open source, and integration complexity.

ai guide

RAG Observability: 17-Step Tracing to Turn the Black Box Transparent

The hardest part of a RAG system isn't building it — it's figuring out why a particular answer went wrong. Pipeline Tracing records every step's decisions and data so debugging has a clear trail to follow.

ai guide

RAG Prompt Engineering: How to Design System Prompts and Context

Search found the right documents, but the LLM's answers are still poor — often the problem lies in prompt design. System prompt structure, context formatting, and instruction placement all affect output quality.

ai guide

RAG Streaming: Using SSE to Display LLM Responses as They Generate

LLM generation takes 3-5 seconds, and waiting for the full response before displaying it makes for a terrible experience. SSE pushes tokens as they're generated, reducing time-to-first-character from 5 seconds to under 1 second.

ai guide

RAG Quota System: Controlling LLM Costs with Dual Limits

Limiting request count alone is not enough — a single long query can consume ten times the tokens of a normal one. Dual quotas (request count + token count) are what truly control costs.

ai deep-dive

RAG vs Fine-tuning: It's Not Either/Or

RAG and Fine-tuning solve different problems. RAG gives the model new knowledge; Fine-tuning changes the model's behavior and style. In most cases you use both, not pick one.

ai guide

RRF: How to Merge Multi-Source Results in RAG Systems

BM25, vector search, HyDE, and Multi-Query each produce separate result sets -- how do you merge them sensibly? RRF uses ranks instead of scores, sidestepping the fundamental problem that scores from different systems are incomparable.

ai guide

Self-Reflection + LLM-as-Judge: Having AI Evaluate Its Own Answers

Use another LLM to evaluate answer accuracy and quality — if the score is too low, regenerate, and automatically add appropriate disclaimers.

ai guide

Semantic Caching: Run the RAG Pipeline Only Once for Semantically Similar Queries

Caching doesn't have to match exact query strings -- semantically similar questions can hit the cache too, skipping the entire RAG pipeline execution.

ai guide

SPLADE: Smarter Sparse Vector Search Beyond BM25

BM25 only recognizes words that appear in the query. SPLADE infers related terms and adds them to the search, gaining partial semantic capability while preserving the precision of keyword search.

ai guide

Text-to-SQL Router: Precise Queries That Skip RAG

Questions like 'how many routes did I complete this year' will never be answered well by RAG semantic search — querying the database directly is far more accurate. Let the LLM identify intent, extract parameters, and execute predefined SQL templates.

ai guide

Vector Database Selection: How to Choose Between Pinecone, Weaviate, Qdrant, and Vectorize

Vector database selection is more constrained by deployment platform than LLM selection. Determine your platform and scale requirements first, then evaluate features — don't just look at benchmarks.

education guide

Why Your Learning Goals Always Fizzle Out — And How DaoDao Wants to Fix It

The core reason self-directed learning fails isn't lack of motivation — it's the absence of a co-learning environment. DaoDao turns 'wanting to learn' into 'actually learning' through themed practices, inspiration feeds, group challenges, and learner connections, while turning your growth journey into tangible proof of competence.

product project

The Next Frontier in Online Learning: Why Completion Rate Is the Real Problem

MOOC completion rates hover at just 5–15%, and the problem isn't course quality — it's the execution gap. DaoDao positions itself as a 'Learning OS,' using public commitments, community interaction, and AI recommendations to make learning visible and sustainable.

product project

From 'Want to Learn' to 'Actually Learning': The Product Design Thinking Behind DaoDao

DaoDao is not a content platform -- it's a learning connector. Using anti-perfectionism design, community co-learning, and zero-decision recommendations, it helps learners bridge the execution gap -- from vague ideas to actionable plans.

product project

Why Does a Climbing Community Need AI? NobodyClimb's Experiment and What We Learned

NobodyClimb uses RAG to tackle scattered climbing route information, ties quota limits to community engagement, and leverages Cloudflare Workers AI to bring inference costs close to zero.

product project

NobodyClimb: Why the Climbing Community Needs Its Own Platform

The climbing community doesn't lack the will to share — it lacks a place to connect and preserve its culture.

tech debug

The Correct Way to Bind a Custom Domain in Cloudflare Workers

In wrangler.jsonc, use custom_domain: true in routes with only the hostname as the pattern — no /* wildcard

tech deep-dive

DaoDao Tech Architecture: Monorepo, Multi-Language Backend, and AI Recommendation System

Next.js + Expo frontend, Node.js + Python dual backend, PostgreSQL + Redis core — plus a social notification system and LLM recommendation engine. Here's how DaoDao builds a learning community platform with a modern tech stack.

tech deep-dive

NobodyClimb: Building a Climbing Community Platform Entirely on Cloudflare

A climbing community platform where the web app, mobile app, and AI Q&A all run on Cloudflare — no dedicated servers.

tech deep-dive

NobodyClimb AI Architecture: Building a 20-Node RAG Pipeline on Cloudflare Workers

A dynamically composable RAG pipeline built on Cloudflare Workers AI (gemma-3-12b-it + bge-m3): 14 base steps + 6 LangGraph-specific nodes, with three strategy graphs (Baseline / Agentic / Plan-Execute) selected at runtime.

tech guide

What You Need to Know Before Switching Astro Blog Templates

Switching templates means replacing the entire project foundation. Figure out what you actually need first, then choose between AstroPaper, Cactus, or AstroWind.