quidproquo
Tech, climbing, surfing, coffee, and everything else.
A personal blog by xiaoxu — covering software engineering, AI agents, climbing, surfing, coffee, and whatever else I'm obsessing over. Written mostly in Traditional Chinese; posts tagged en are in English.
After finishing a debug session, just say 'write this up as a post' — Claude Code extracts content from the conversation, applies a template, generates frontmatter, and commits it to the repo. No extra writing required.
To consolidate scattered notes and showcase diverse interests, I built a personal blog using Astro + Cloudflare Workers D1, paired with a Claude post skill for zero-friction writing.
Astro + the full Cloudflare suite — static-first, edge-computed, zero maintenance cost
Saw a trading post about going from NT$150k to NT$2.4M in half a year. Didn't understand a word of it — warrants, stock futures, maintenance ratio — so I looked them all up.
Loop Engineering is the practice of designing systems that automatically prompt AI agents, rather than prompting them manually. Boris Cherny runs hundreds of agents, Addy Osmani coined the term, and Blake Crosley identified verification cost as the real bottleneck — this article covers primary sources, the five building blocks, applicability boundaries, and criticisms.
Climbing books don't exist in isolation — they represent competing schools of thought, philosophical differences, and knowledge gaps. This post maps the relationships between 60+ books to help you pick the right one to read next.
@playwright/mcp uses an accessibility tree instead of screenshots, cutting token cost by 10–50x — the best default for AI agents doing web automation. Puppeteer MCP fits screenshot-heavy tasks. Direct CDP via MCP is for low-level tooling or domains that Playwright/Puppeteer don't expose.
Chrome DevTools MCP wraps Chrome DevTools Protocol (CDP) as an MCP server, giving AI agents direct access to 40+ CDP Domains including Profiler, HeapProfiler, and Security that Playwright and Puppeteer MCP don't expose — at the cost of having to implement MCP tool definitions and auto-wait logic yourself.
@playwright/mcp defaults to an accessibility tree (browser_snapshot) instead of screenshots, cutting token consumption by 90%+. Combined with Playwright's native auto-wait, it's the best starting point for AI agents doing web automation.
server-puppeteer is the Puppeteer wrapper in the official MCP servers monorepo — seven lean tools built around screenshots and evaluate. Token cost is significantly higher than @playwright/mcp per interaction, but it fits well when the screenshot itself is the deliverable or custom JS execution is the core need.
A Threads post by @jj.investnote laid out a 2x leveraged ETF system built on three books: 60% 2x ETF + 40% cash, Beta=1.2 target, ±10% rebalancing trigger, and a crash protocol.
From the CLI tool kin3o to the CVPR 2026 paper OmniLottie — a survey of open-source approaches for converting text and images into Lottie animations, with performance benchmarks and selection guidance.
AI agents running tests are non-reproducible; hand-written Playwright is hard to maintain. Four tools that emerged in 2024-2025 each tackle this dilemma with very different design philosophies.
MUSE-Autoskill (2026) introduces a five-stage skill lifecycle framework. Self-created skills achieve 60.35% (+7.16%) on SkillsBench overall, and an impressive 87.94% on tasks where skill generation succeeds — surpassing the human-authored skill ceiling. This post synthesizes six arXiv papers to map the full landscape of skill evolution research.
Even with temperature=0, LLM outputs can still fluctuate by up to 15% in practice. To rigorously compare agent changes, you need a frozen golden set, at least 3 runs per query averaged out, LLM-as-judge blind evaluation (pairwise preference flip rate reaches 35%), and paired statistical tests -- not just running each version once and going by feel.
The industry has converged on using OpenTelemetry GenAI semantic conventions to turn every LLM call and tool call into a span. Detecting the three major failure modes then splits into three tracks: faithfulness + semantic entropy for hallucinations, framework-level symbolic guardrails for tool misuse, and max steps + action hash deduplication for infinite loops — all wired into a Final / Trajectory / Single-step three-layer evaluation framework.
Agent decision-making under resource constraints is bounded rationality reborn: Rational Metareasoning uses VOC rewards to save 20-37% of tokens, BATS proves that adding budget without budget awareness is futile, FrugalGPT cascades cut costs by up to 98%, and Speculative Actions reduce latency by 20%. The three constraints ultimately converge into a single Pareto curve, and the overarching trend is moving from humans tuning knobs to models making resource-rational decisions on their own.
Three seemingly distinct agent security problems — tool output injection, trust boundaries, malicious agents — share the same root cause: LLMs flatten instructions and data into a single token stream, making them architecturally unable to distinguish between the two. Understand this through-line and you can trace every attack from EchoLeak (CVE-2025-32711, zero-click) to the Morris II AI worm, and see why 'making the model behave' doesn't work — only architectural constraints (six design patterns, CaMeL) do.
Traditional RAG is a fixed pipeline of 'retrieve then answer.' Agentic RAG splits retrieval into three decision layers: when to retrieve (FLARE uses token probabilities; Adaptive-RAG uses a complexity classifier), what to retrieve (HyDE / RAG-Fusion / decomposition / Step-back), and how to fuse (RRF k=60 then cross-encoder rerank then compression -- Anthropic measured a -67% failure rate reduction). Key counter-intuitive insight: unnecessary retrieval hurts quality -- 'deciding not to retrieve' is a first-class capability.
Automatic prompt optimization (APO) has evolved from APE/OPRO to GEPA: replacing sparse rewards with linguistic reflection, winning over GRPO by ~6pp with 4-35x fewer rollouts. Meanwhile, tool descriptions are the overlooked prompt -- small wording changes can shift tool selection rates by 10x, and Anthropic's experiments show Claude self-rewriting tool descriptions outperforms human experts. These two lines are converging: eval-driven automatic optimization is eating hand-tuned prompts.
An autonomous research agent = four controllable stages: planning (decompose into sub-questions), retrieval loop (search -> read -> reflect on gaps -> search again), evidence arbitration (>=2 independent sources, typed conflict handling), and verifiable output (sentence-level citations + independent verification pass). Two approaches: training-based uses RL to learn end-to-end when to search (Search-R1 +41%); orchestration-based uses orchestrator-worker division of labor (Anthropic internal eval +90.2%, at ~15x token cost).
Inferring another's beliefs/goals/intentions from observed behavior is called Machine Theory of Mind. Three lineages: symbolic BDI, Bayesian inverse planning, and deep learning ToMnet. The biggest controversy in the LLM era is that GPT-4 still trails humans by >10 points on ToMBench — are high scores genuine reasoning or statistical shortcuts?
At 99% accuracy per step over 100 steps, the error-free completion rate drops to just 36% -- error compounding is a structural problem, not something prompt tuning can fix. Distributed systems' supervisor trees, bulkheads, circuit breakers, sagas, and durable execution can be mapped almost one-to-one into agent orchestration. But LLMs introduce a failure class that traditional systems never had -- semantic errors that don't crash -- which require Inspector agents (recovering 96.4%) and redundancy voting (MAKER: one million steps with zero errors) to address.
Cosine similarity and relevance systematically diverge across an entire class of scenarios: negation (most IR models score at or below random on NevIR), exact identifiers, numeric thresholds, and logical combinations (SoTA models achieve recall@100 < 20 on LIMIT) -- some of these hit the theoretical ceiling of the single-vector paradigm, and switching to a larger model will not help. Recommended remedy order: hybrid BM25 -> reranker (Anthropic measured -67%) -> upstream metadata routing -> domain fine-tuning / multi-vector.
As tools scale up, selection accuracy doesn't degrade gracefully — it collapses: 4 to 51 tools drops from 43% to 2%, 10 to 100+ drops from 78% to 13.62%. The root fix is to stop stuffing everything in at once — Anthropic's Tool Search Tool uses defer loading plus retrieval to cut 85% of tokens, pushing Opus 4.5 accuracy from 79.5% to 88.1%. Description quality has conditional payoff: negligible in simple scenarios, but correctness jumps from 44% to 50% in multi-tool chaining.
Traditional Chinese RAG retrieval failures are a three-layer stack: embedding granularity defects (BGE/GTE from 0.1B to 7B all mis-rank on simple queries like 'fried chicken'), Simplified Chinese / English corpus dominance causing local vocabulary drift ('premium', 'exclusion clause' alignment is unreliable), and MTEB Chinese benchmarks being Simplified Chinese making model selection signals misleading. The fix is architectural: OpenCC normalization -> hybrid + jieba segmentation -> reranker -> local fine-tuning last -- and the prerequisite for all of it is building a Traditional Chinese eval set first.
arXiv does not perform peer review, and roughly 2% of submissions are rejected. Quality judgment relies on external signals: top venue acceptance > institution + open-source reproduction > citation quality. Includes a 20-item practical checklist and a 2026 toolbox (PWC has shut down).
A Go read-only scanner open-sourced by Perplexity in May 2026 (v0.1.1, zero non-stdlib dependencies). It inventories npm/PyPI/Go/RubyGems/Composer/MCP/editor and browser extensions into NDJSON, matches against a custom exposure catalog, and answers the question 'which machines in my fleet are currently affected' the moment a supply chain incident hits. It deliberately never invokes any package manager and is not an EDR.
Making 'chunk and embed every uploaded file automatically' the default behavior means making a decision for the LLM that it could have made itself. From Self-RAG (2310.11511) and Adaptive-RAG (2403.14403) to AgenticOCR (2602.24134), the academic trajectory is pushing three layers of decision-making -- whether to retrieve, whether to parse, and how to chunk -- from the ingestion pipeline back to the agent at conversation time.
The hard part of LLM agents is not building function calling, skills, code interpreter, and document tools individually -- it is assembling them into a system that selects the right tool, writes code when needed, decomposes tasks, verifies results, and resists prompt injection. This post organizes the key papers into six engineering decisions: function calling reliability, tool/skill selection, code-as-action, multi-step planning, skill systems, and safety plus document generation.
When mobile Chrome keeps redirecting back to the login page after sign-in, the culprit isn't always OAuth or broken frontend state. In this case, the root cause was that the HTTP entry point for app-dev.daodao.so wasn't issuing a 301 redirect to HTTPS, so /auth/me requests sent with an http origin didn't include the auth_token cookie.
A2UI is an agent generative UI protocol open-sourced by Google on 2025-12-15: agents send declarative JSON describing UI intent, and clients render it natively using their own component catalog whitelist, layered on top of A2A. It launched at format v0.8 and iterated to v0.9 within three months.
browse.sh, launched by Browserbase in May 2026, is two things: a browser skill catalog and the Browse CLI. The core thesis: the bottleneck for browser agents isn't reasoning — it's amnesia. By storing learned site-specific workflows as plain-text SKILL.md files, Autobrowse cut Craigslist task costs from ~$0.22 to ~$0.12 by their own metrics. Note: this has nothing to do with the 2018 Browsh text-mode browser.
CodeGraph uses tree-sitter to extract a codebase into a local SQLite/FTS5 knowledge graph, letting AI coding agents query the graph instead of scanning files. The official end-to-end benchmark (7 repos, median of 4 runs) averages 35% cost savings and 70% fewer tool calls -- but only if the agent actually walks the graph. Delegating exploration to a file-reading subagent that ignores CodeGraph turns it into pure overhead.
Reading papers is two problems stacked together: methodology (Keshav's three-pass method, 5-10 min / 1 hour / 4-5 hours) determines how to read, and tools (arXiv HTML, alphaXiv, NotebookLM, Connected Papers, Zotero) shorten the time for each pass. AI lowers the barrier to understanding; judging correctness always stays with the human.
An MIT-licensed open-source UI automation framework from ByteDance (~13k GitHub stars). UI actions rely solely on feeding screenshots to vision-language models (Qwen3-VL / Doubao / Gemini-3 / UI-TARS), with no DOM parsing. A single JS API works across Web / Android / iOS / desktop, and starting from v1.0, the DOM action mode was removed entirely. The trade-off: each step is slower and more token-expensive.
Antigravity CLI is a terminal agent Google announced at I/O on May 19, 2026. Written in Go (versus Gemini CLI's Node.js), its binary is called agy, and it shares the same agent harness as the desktop Antigravity 2.0. It is also Gemini CLI's successor — the personal-tier Gemini CLI service ends on June 18, 2026.
Claude has no docx_tool or pdf_tool -- it relies on bash + file tools, plus SKILL.md instructions and pre-installed libraries like pdfplumber / python-pptx inside the container, assembling file handling capabilities from three layers.
Anthropic shipped Claude Design on 2026-04-17. On 4-28, nexu-io/open-design went public -- same artifact-first loop, Apache-2.0, runs on the 16 coding-agent CLIs you already have. Two weeks from 0.1 to 0.7, 40k+ stars. A paradigm shift that flattens AI design tools from vertical SaaS into a skill bundle.
asgeirtj/system_prompts_leaks collects the raw system prompts of 40+ AI assistants, from GPT-5.5 and Claude Opus 4.7 to Gemini 3.1 Pro, with 40.3k stars, 461 commits, and an MIT license. The value isn't in obtaining secrets -- it's in turning vendors' implicit policies into comparable engineering material. What you should study is the design decisions, not the text itself.
Anthropic's 35-page startup handbook released 2026-05-14 reorganizes Idea/MVP/Launch/Scale around agentic AI. The most valuable takeaways are 'the easier it is to build, the more important validation becomes' and treating CLAUDE.md as the first MVP artifact. The part to discount: the Launch chapter puts compliance workstreams on Cowork -- but Anthropic's own docs say Cowork doesn't write audit logs.
Rewriting tool descriptions from soft suggestions to hard rules (whitelist + consequence explanation) eliminated the LLM's incorrect tool selection; adding skip_signal=True fixed vector store double-indexing.
AI agents can operate video generation tools through three approaches — Skills, MCP Connectors, and direct APIs. Choosing the right integration method matters more than choosing the right tool.
Stop stuffing all your tool descriptions into context at session start. Let the model write code, have the runtime execute it, and let tool definitions enter context only at the import line — Anthropic's GDrive→Salesforce example dropped from ~150K tokens to 2K, and Cloudflare's 2,500-endpoint schema shrank from 1.17M to 1K.
MIT research says 95% of enterprise AI pilots yield zero return. OpenAI and Anthropic announced multi-billion-dollar joint ventures in the same week, wholesale adopting the Forward Deployed Engineer model that Palantir has used for over a decade to bring AI into the enterprise battlefield.
A survey of 11 public LLM writing pipelines, distilled into three dominant patterns: multi-agent (researcher -> writer -> critic), Karpathy LLM-wiki (raw + wiki + LLM writes, humans don't), and quality guardrails (technical verifier + never fabricate + brief gate). The Princeton GEO paper (KDD 2024) quantifies the impact: inline citations +28%, adding statistics +33%, quoting source text +41%, keyword stuffing -9%.
In May 2026, OpenAI published its internal Codex deployment practices: sandboxes define technical boundaries, approval policies determine when to pause, Auto-review delegates approval decisions to a sub-agent instead of a human, and Managed configuration lets enterprise admins enforce policies top-down. The core philosophy: zero friction for low-risk actions, mandatory review for high-risk ones.
Spin up a local OpenAI-compatible endpoint at localhost:20128 that automatically routes requests from Claude Code / Cursor / Cline / Codex / Copilot through a Subscription → Cheap → Free 3-tier fallback to 40+ providers. Built-in RTK compresses tool_result (saving 20–40% input tokens), Caveman mode compresses output, OAuth auto-refresh, multi-account round-robin — install with npm install -g 9router and two commands.
Anthropic builds an extension, OpenAI builds its own browser, Google welds AI directly into Chrome — three completely different approaches. Here's a comparison of the current landscape, key differences, and a selection guide.
Stripe Minions says 'The walls matter more than the model,' but the case studies from four Silicon Valley companies never explained how to actually build those walls. This post breaks down the 15 walls we implemented in the daodao auto-dev agent: what each wall prevents, where the files live, and what the tradeoffs are. Tier 1 is mandatory, Tier 2 strengthens governance, Tier 3 is serious governance.
A PM checks a task card in Notion → the system syncs it to a GitHub issue → writes a plan → writes code → opens a PR for human review. This post explains what the system does, what it doesn't do, and why it's feasible now — written for people who don't write code.
Build a Notion task → GitHub issue → spec PR → code PR auto-dev agent from scratch. Using the daodao case as a template, this guide walks through every step — what to do, what to verify, and how to handle problems. Notion DB schema → bin/ scaffold → two Claude Code routines → cloud env vars → staging tests.
Anthropic open-sourced 12 financial-industry Agents and 11 MCP connectors. The real takeaway isn't the Agents themselves but the layered design of 'one prompt, two runtimes' and 'pure-file extensibility.'
5 rounds of consensus to write the plan, then team mode with 5 workers running 12 tasks in parallel — with plenty of pitfalls along the way. Writing it down for my future self and anyone else trying the same thing.
DeepSeek-OCR's paper is titled Contexts Optical Compression -- OCR is just the means; what it actually validates is that 'rendering text as images and feeding them to a VLM' achieves 10x compression at 97% accuracy. This is a qualitative shift for long-context LLM and RAG token costs.
For side projects, toy demos, and RAG prototypes, nobody wants to swipe a credit card on day one. This is a verified roundup of 40+ LLM inference providers still operating as of 2026/05, tiered by whether free resources auto-replenish or are one-time grants. Each entry notes credit-card requirements, supported models, paid starting prices, and catches. Chinese-origin providers including Zhipu GLM (permanently free), Doubao (2M tokens/day), Kimi, DashScope, and the Ollama local option are all included.
/loop is Claude Code's native cron feature — set schedules in plain English and let Claude monitor, auto-fix PRs, and run recurring tasks in the background. Session-scoped and expires after 7 days; for cross-session scheduling, use Routines or Desktop scheduled tasks.
Routines is Claude Code's cloud automation system (formerly Cloud Scheduled Tasks). Beyond cron scheduling, you can trigger runs via API endpoint or GitHub events — scan issues, review PRs, run checks, open PRs — all while your computer is off.
A Skill is a folder with a SKILL.md. Three-layer progressive disclosure lets Claude load details only when needed, eliminating the need to re-explain preferences every conversation.
Local Deep Research is a privacy-first deep research agent built on LangChain + LangGraph, integrating 20+ search engines and 30+ research strategies. Its flagship langgraph_agent_strategy takes the LLM-autonomous tool-calling approach, offering a fundamentally different paradigm from fixed-pipeline RAG graphs.
PageIndex skips chunking, embedding, and vector storage entirely. Instead it relies on LLM reasoning over a tree-structured table of contents the LLM itself wrote, achieving 98.7% on FinanceBench (GPT-4o reading directly scores only 31%). It solves a different problem than vector RAG — finding the right section in a well-structured long document.
Two answers stand out for remotely accessing your home Mac in 2026: Cloudflare Tunnel if you need browser-based access with no client install, and Tailscale if you just want something simple for personal use. This post compares both, covers ZeroTier, Pangolin, NetBird, and other alternatives, and explains why Cloudflare's remotely-managed tunnel makes setup significantly easier in 2026.
When using AI agents like Claude Code or Cursor, built-in WebFetch / WebSearch often gets blocked by Cloudflare, geo-restrictions, or rate limits. Connecting a search MCP server is the most direct fix. This post compares the options actually available in 2026.
Groq Console is the developer portal for Groq's in-house LPU chip, offering an OpenAI-compatible API, Playground, and free tier credits. Its selling point is running open-source models like Llama, Qwen, and DeepSeek at the fastest tokens/second on the market.
Warp evolved from a Rust-powered modern terminal into an AI Agent-integrated development environment (ADE), open-sourced under AGPL in April 2026, with over 700,000 developer users.
goose is an open-source AI Agent maintained by the Linux Foundation's AAIF, supporting 15+ LLM providers and 70+ MCP extensions, built with Rust as a Desktop App + CLI + API. It positions itself as a vendor-neutral, self-hostable alternative to Claude Code.
For running LLMs on Cloudflare Workers AI, gemma-3-12b-it follows Traditional Chinese instructions noticeably better than llama-3.1-8b-instruct. With Gemma 4 arriving in 2026, you get Vision, Function calling, and 256K context -- upgrade as needed.
Qwen (Tongyi Qianwen) is Alibaba's open-source LLM family, known for its Apache 2.0 license, 201-language coverage, and rapid iteration. The latest Qwen3.6 (2026/04) focuses on Agentic Coding — the 27B Dense version achieves 77.2% on SWE-bench and 59.3% on Terminal-Bench 2.0, on par with Claude Opus. A new Thinking Preservation feature lets agents retain reasoning context across turns.
Karpathy proposed the llm-wiki pattern in 2026, having LLMs proactively maintain a markdown wiki instead of running RAG from scratch every time. Over 100 open-source implementations now exist, ranging from local CLI tools to serverless Telegram bots.
On 2026/4/22 OpenAI launched Workspace Agents — powered by Codex, capable of long-running cloud execution, and integrating with Slack/Salesforce/Google Drive. They are the enterprise successor to Custom GPTs.
Using Weaviate Query Agent + ColQwen multi-vector model, a single prompt built a production-grade legal contract search system in 36 hours -- this post breaks down its architecture logic, technology choices, and what you actually need to watch out for.
Akira runs a Threads → Blog → Docs three-tier funnel with agent-assisted publishing, building a sustainable knowledge monetization model in the Chinese-language AI content market.
Cloudflare ran a Multi-Agent Code Review system internally for 30 days — 131K reviews, median 3 minutes. This post breaks down their architecture and compares it with solutions from Anthropic, GitHub, CodeRabbit, Greptile, and others.
A detailed look at OpenAI's Codex agent loop design: how prompts are constructed, how multi-turn conversations are managed, how prompt caching prevents cost explosions, and how context window auto-compaction works.
OpenAI wrapped the Codex harness as a JSON-RPC over stdio App Server, enabling VS Code, JetBrains, Web, and desktop apps to share a single agent loop. Three core primitives: Item, Turn, and Thread.
An OpenAI internal team spent 5 months with 3 people and 0 lines of hand-written code, delivering a complete product using Codex. This article distills their core lessons on AGENTS.md design, repo-local knowledge bases, architecture enforcement, and entropy management.
AEO/GEO tools aren't a single category — they span three distinct layers: the input layer (is your website ready for AI to read), the traffic layer (how much are AI bots actually crawling), and the output layer (how is your brand mentioned in AI answers). This post maps out all three layers, from open-source self-hosted options to commercial SaaS.
DeerFlow is ByteDance's open-source Super Agent Harness built on Python 3.12 + LangGraph. It orchestrates long-running tasks through sandboxes, long-term memory, sub-agents, skills, and a messaging gateway. It hit #1 on GitHub Trending in February 2026, now surpassing 63,000 stars, with support for Telegram/Slack/Feishu, Claude Code integration, and multiple search backends.
2026/4/1 new rules: max 2 policies per trip (different insurers), flat-rate payout cap lowered to NT$6,000. Covers six key areas including flight delays and lost luggage, with a breakdown of where to buy.
Agentic Engineering isn't about making AI write code faster — it's about making software move through the entire delivery pipeline faster, by using multi-agent collaboration to compress cross-team coordination friction.
Agent memory isn't a plugin — it's part of the harness itself. Pick the right memory type, estimate data volume, then decide on the technology. And finally, figure out whether you actually own that memory.
AI models rationalize their own code when reviewing it. Using three different CLIs for independent review effectively catches blind spots -- this post covers the design philosophy and practical workflow patterns behind the approach.
NotebookLM has no official API. This extension works by combining three techniques: reverse-engineered Google batchexecute RPC calls, DOM scraping, and cross-tab message passing.
The main backend runs on a remote HTTPS server, so the auth_token cookie is scoped to that domain. The browser never sends it to the local AI backend, causing the API to treat every request as unauthenticated.
Agentic AI is not just autocomplete — it is an AI system capable of autonomously executing multi-step tasks. This article breaks down the five phases of the SDLC, explaining where to plug in agents at each phase, how to progress from CLI tools to full-pipeline automation, and the most valuable external resources to track right now.
Encyclopedia of Agentic Coding Patterns catalogues 190 patterns to help you make the right software decisions in the age of AI-written code — and the book itself is autonomously written and maintained by an AI agent.
GitHub Copilot Coding Agent lets you assign an Issue to Copilot, which then automatically creates a branch, writes code, runs CI, and opens a PR — all inside a cloud sandbox. The key to success is setting up AGENTS.md; without it, the agent tends to go off track. Best suited for well-defined medium-sized tasks; requires Pro+ (1,500 premium requests/month) or Enterprise plan.
A six-layer deterministic pipeline that handles everything from URL ingestion to vector embedding automatically, filtering out garbage before it enters your RAG system through an eight-dimension scoring system.
A lightweight open-source tool from Microsoft that converts PDF, Office, images, audio, and more into Markdown — purpose-built for LLM pipelines.
MCP is not going away, but its effective scope is narrower than most people think. For local development, CLI and raw API almost always beat MCP. MCP's truly irreplaceable niche is the narrow gap of 'cross-agent shared local tool layer.'
Different AI engines process web pages in vastly different ways. Some only read the body; others rely on pre-built indexes. JSON-LD and schema markup are not universally effective — body content quality and structure are the only cross-platform foundations that hold.
Using my own 30+ RAG/Agent posts to audit the blog itself, I identified a prioritized improvement list spanning content quality, site tech, RAG design fixes, harness infrastructure, and AI agent applications — no phases, just priorities.
Not everyone should use a coding agent to modify code directly. AI Native teams need interface specs, test-first development, monorepo, security guardrails, human-in-the-loop, and token budget controls. Building an agent platform layer on top of coding agents and clearly redefining developer roles is the right path forward.
Autoreason replaces the traditional critique-and-revise loop with a competitive multi-version evaluation mechanism (A/B/AB + blind Borda count), solving three structural problems in LLM self-refinement: prompt bias, scope creep, and lack of restraint.
An open-source coding agent reference implementation from Vercel Labs. A three-layer architecture separates the web UI, agent workflow, and sandbox VM — designed as a starting point for teams that want to self-host their own Claude Code or Cursor Background Agent.
env.AI is not just run(). It also exposes toMarkdown (document-to-Markdown conversion), autorag (managed RAG), gateway (external provider proxy), and models (metadata lookup). Understanding these four method groups is what unlocks Cloudflare as a full AI platform inside Workers.
Claude Octopus is a Claude Code plugin that simultaneously calls Codex, Gemini, Copilot, Qwen, Ollama, Perplexity, OpenRouter, and Claude to review the same code, using a 75% consensus threshold to catch single-model blind spots. It ships with 32 personas, 48 /octo:* slash commands, 51 skills, and a Dark Factory fully autonomous spec-to-code pipeline.
LLM Council is a local Web App Andrej Karpathy built over a weekend. It sends one question to multiple LLMs simultaneously, has them anonymously peer-review each other, and then a Chairman model synthesizes a final answer. Positioned as a small tool for comparing models while studying — 99% vibe coded with no plans for long-term maintenance — but the architecture itself is a minimal ensemble LLM implementation worth studying.
Better Agent Terminal (BAT) is an Electron desktop app that unifies multiple project workspaces, terminals, and Claude Code Agents into a single window — solving the everyday pain of exploding iTerm tabs and the lack of a proper GUI container for agents. MIT License, available on macOS, Windows, and Linux.
Claude Managed Agents is a beta service launched by Anthropic on 2026/04/08 that provides an agent harness plus cloud container sandbox, billed per token plus $0.08/session-hour. It suits long-running async tasks and is worth exploring if you don't want to build your own agent loop and sandbox.
Agent Skills is Addy Osmani's open-source collection of 19 production-grade engineering skills that drive AI agents to follow senior engineering discipline through /spec → /plan → /build → /test → /review → /ship commands, instead of cutting corners.
Graphify uses tree-sitter AST to extract code structure, then applies LLM semantic analysis to documents and images, compressing an entire project into a queryable knowledge graph. It claims to save 71.5x tokens per query compared to reading raw files.
Claw Code is a from-scratch Rust rewrite of the Claude Code CLI, featuring 48K lines of code, 40 tools, and MIT licensing. Most remarkably, the entire project was built by multiple AI agents collaborating over just 5 days, surpassing 170K GitHub stars within a week of launch.
clawhip is a Rust daemon that routes AI coding agent events (commits, PRs, session status) to Discord / Slack, solving the observability problem of not knowing who is doing what when multiple agents run in parallel.
Hermes Agent is an open-source self-improving AI agent by Nous Research, featuring persistent memory, skill learning, 40+ tools, multi-platform gateways, support for 200+ model providers, and serving as the official successor to OpenClaw.
notebooklm-py reverse-engineers Google's batchexecute RPC protocol, letting you programmatically control NotebookLM via Python / CLI / AI Agent — including audio, video, slides, quiz generation and more.
oh-my-claudecode (OMC) adds 8 collaboration modes, 19 specialized agents, and cross-model orchestration (Claude + Codex + Gemini) on top of Claude Code, transforming a single-user CLI tool into a multi-agent development platform. Features include Deep Interview for requirement clarification, Smart Model Routing that saves 30-50% on tokens, and automatic rate limit recovery.
oh-my-codex (OMX) doesn't replace Codex CLI — it adds a structured workflow layer on top of it. From requirements clarification and plan generation to multi-agent parallel execution, four core Skills transform scattered prompt conversations into a trackable development process.
oh-my-openagent (OmO) transforms OpenCode from a single-LLM tool into a multi-model agent team — Opus as the workhorse, GPT-5.2 as the architect, Gemini for frontend, Sonnet for documentation lookup — all triggered to run in parallel with a single ultrawork keyword. With 48K stars, it is the earliest project in the UltraWorkers ecosystem to establish the multi-agent coding pattern.
An open-source Agent Harness framework from HKUDS (HKU Data Science Lab) that implements tool calling, skill loading, memory, permissions, and multi-agent collaboration as complete infrastructure, supporting Anthropic / OpenAI / GitHub Copilot API formats.
Claude Code only reads CLAUDE.md; Codex only reads AGENTS.md. Teams using both end up maintaining two identical files. Fix: make CLAUDE.md a symlink pointing to AGENTS.md — one source of truth.
There are already 6,400+ .claude/agents/*.md files on GitHub. We dissected 4 representative projects — ChemistryTimes (content production pipeline), claude-sub-agent (document-driven development pipeline), agentic (Temporal.io DAG parallel execution), and vs-copilot-multi-agent (hook-enforced memory persistence) — plus ruflo's enterprise-grade swarm architecture, distilling 6 design patterns and 5 practical trends.
Top Silicon Valley companies are independently building internal AI coding agents that automate everything from a Slack message to a merged PR. This article deep-dives into architectures from Stripe, Ramp, Coinbase, and Spotify, then expands to cover Google, Meta, Amazon, Uber, Goldman Sachs, Walmart, and more.
Andrej Karpathy proposed a framework for compiling personal knowledge wikis with LLMs — collect raw data, have the LLM compile it into .md wiki pages, run Q&A against the wiki, and file outputs back. This post compares three practical approaches: Karpathy's knowledge vault model, the community's experience vault model, and quidproquo's blog model.
After dissecting Claude Code's 18+ caching mechanisms, I found that you can't touch provider-level prompt cache, but embedding cache, tool result cache, and entity cache are not only within your reach — they deliver even better results. Includes a complete AgentCache interface design and per-tool TTL strategy.
Every one of Claude Code's 45 tools uses a prompt() method that dynamically adjusts based on user type, feature flags, and system capabilities. Applying this pattern to a ReAct Agent, tool descriptions are dynamically generated along three dimensions: orchestrator model capability, locale, and available tools. Small models automatically get few-shot examples; large models save tokens.
From $20/mo Pro to $200/mo Max 20x, Claude Code's Opus 4.6 delivers the strongest reasoning depth in the industry, and its Max plan's unlimited pricing saves heavy users over 90% compared to API costs.
Cursor CLI brings the IDE Agent into the terminal, supporting interactive TUI and headless modes, Plan/Ask/Agent three modes, Cloud Handoff, CI/CD integration, $20-200/mo.
Gemini CLI will be discontinued on 2026/06/18, with Antigravity CLI as the official successor. Before shutdown: free 60 req/min, 1,000 req/day, including Gemini 2.5 Pro and 1M token context window. Skills, Hooks, and Subagents can all be migrated.
Kiro's free plan includes 50 credits. Auto mode intelligently mixes models to save costs. Spec-Driven development upgrades vibe coding into traceable, structured workflows. Agent Hooks enable local CI/CD automation.
Codex is tied to ChatGPT subscriptions ($20-200/mo). GPT-5.4 + mini automatic routing is the highlight, and the CLI supports dual billing via Plan mode and API Key mode.
OpenCode is a free, open-source CLI agent written in Go with 95K+ GitHub stars. It supports 75+ model providers including local Ollama, allows authentication via Copilot/ChatGPT accounts, and lets you switch models mid-session without losing context.
Comparing six major Agent CLI subscription plans in 2026 (Claude Code, Cursor CLI, Codex, Kiro, Gemini CLI, OpenCode), and exploring multi-model routing patterns — routing simple tasks to cheaper models and complex tasks to flagship models, with real-world savings of 40-85%.
Comparing the NVIDIA DGX Spark, Apple Mac Studio M4 Ultra, ASUS Ascent GX10, MSI AI Edge, and more — helping you find the right local inference hardware.
With multi-model routing, 70% of simple tasks are directed to cheap models, and only 10-15% of complex tasks use flagship models — saving 40-85% on inference costs in practice. This article covers the architecture and implementation of five major open-source tools.
A breakdown of the three-layer digital ecosystem structure: LINE's super-app, Shopify App Store flywheel, and Taiwan MarTech integration strategies. The core mechanism is using APIs and data flows to create mutual dependency among participants, collectively reinforcing the moat.
Skill paths are almost always runtime-specific. AGENTS.md is the reliable way to share rules across agents. Put personal reusable capabilities in each agent's supported global directory; put project workflows inside the repo.
code-review-graph uses Tree-sitter to parse your codebase and build a persistent knowledge graph, tracks the blast radius of changes, and feeds only truly relevant context to the AI — claiming an average 8.2x reduction in token usage.
GitBook is a Git-based documentation platform with Markdown editing, version control, and multi-user collaboration. Ideal for technical docs, API references, and internal knowledge bases. The free plan is sufficient for individuals and small teams.
The NVIDIA DGX Spark is powered by the GB10 Grace Blackwell Superchip, 128 GB of unified memory, and delivers 1 petaFLOP of FP4 compute — starting at around $3,999 USD. It lets developers run 200B-parameter models locally and fine-tune 70B models, making it the most accessible NVIDIA AI development platform available today.
A breakdown of nine major documentation platforms — their positioning, pros, cons, and ideal use cases. Decision logic: open-source projects → Docusaurus/VitePress, API docs → Mintlify/ReadMe, internal enterprise → Confluence, fastest to launch → GitBook.
Agent CLIs are not smarter autocomplete tools -- they are AI agents that can read your codebase, execute multi-step tasks, and operate in real environments. Claude Code, Codex CLI, Gemini CLI, OpenCode, Aider, Pi, Kiro, Amp, Cursor CLI... the tools keep multiplying, but they all share a common set of design principles -- understanding these principles is how you actually get good at using them.
Sorted by GitHub Stars, a survey of 15 mainstream AI Agent frameworks in 2026 — their positioning, key features, and ideal use cases. Not a ranking — it's a map.
Use Claude Code as an orchestrator to chain Playwright screenshots, catbox.moe image hosting, Meta Graph API publishing, and Telegram notifications — generate and publish an IG carousel from a single sentence.
llama.cpp is the most widely used local LLM inference engine, implemented in pure C/C++. It supports CPU, Metal, CUDA, Vulkan, and other backends, and uses the GGUF quantization format to run multi-billion-parameter models on consumer hardware.
TurboQuant+ is an open-source implementation of a Google Research ICLR 2026 paper that uses PolarQuant + QJL two-stage quantization to compress the KV cache by 3.8-6.4x, enabling consumer hardware to run larger models with longer contexts.
The main on-device LLMs in 2026 are Gemma 3n, Qwen 3.5 Small, Llama 3.2, Phi-4-mini, Ministral 3, and SmolLM3. Sub-3B quantized models can hit 30-50 tokens/sec on phones with 8GB RAM, but RAM, thermal throttling, and context window remain hard constraints.
2026 Q1 saw a full-blown open-source model explosion: on the LLM front, GLM-5, Kimi K2.5, and Qwen3.5 caught up with closed-source models; Embedding and Reranker are dominated by Qwen3 and BGE; speech has Voxtral TTS and Whisper V3; image has FLUX.2; and video has Wan 2.2 rivaling Sora. This is the complete navigation map.
Claude Code is Anthropic's agentic coding tool that runs in the terminal, IDEs, Slack, GitHub, and on the web. Its core extension system has six layers: CLAUDE.md (persistent context), Skills (on-demand workflows), Hooks (deterministic automation), Subagents (isolated delegation), MCP (external tool connections), and Agent Teams (multi-agent collaboration).
Codex CLI is OpenAI's open-source terminal coding agent built in Rust. It supports MCP, subagents, image input, and code review. Paired with the codex-1 (o3-optimized) or GPT-5-Codex model, it can read, write, and execute code directly on your local machine.
Gemini CLI is Google's open-source terminal AI agent (Apache 2.0). ⚠️ Announced end-of-service on 2026/06/18 — official migration path is Antigravity CLI. Free accounts get 60 requests/minute and 1,000 requests/day; Skills, Hooks, and Subagents all carry over.
OpenCode is an open-source AI coding agent built in Go (95K+ GitHub stars) with a built-in TUI, support for 75+ LLMs, LSP integration, Vim-style editing, and SQLite session management. Free, no subscription required — works with local or cloud models.
Pi is a minimalist coding agent built in TypeScript by Mario Zechner, featuring just 4 core tools (read, write, edit, bash) and a 300-word system prompt. It's extensible via Extensions, Skills, and Prompt Templates, runs on the Bun runtime, and ships with built-in Ollama support via `ollama launch pi`.
In 2025-2026, websites need to be readable not just by humans but by AI. From llms.txt and Schema Markup to GEO and RAG ingestion pipelines, this post maps out the complete technical landscape for turning your website into an AI-consumable data source.
A Harness is more than just an LLM wrapper. Tool Registry manages dynamic tool loading and selection, Guard System establishes a four-layer defense network, and Checkpoint-Resume enables long-running tasks to survive interruptions. These three patterns form the critical infrastructure of production-grade Agent systems.
A Skill is a prompt template you invoke manually. A Subagent is an independent agent that Claude routes to automatically. They look similar, but differ completely in trigger mechanism, tool isolation, and context management.
When AI agents can turn intent into a PR in minutes, the bottleneck in software engineering flips from 'planning what to do' to 'evaluating whether the output is correct.' Artifacts of the ticketing era — sprints, story points, backlog grooming — are collapsing to zero, replaced by review as the core practice.
When processing requests, Claude Code randomly displays one of 185 built-in verbs (like Thinking, Brewing, Clauding), then picks one of 8 completion verbs with elapsed time. You can customize these via spinnerVerbs in settings.json, using either replace or append mode. All data in this post is verified directly from cli.js source code.
gstack is Garry Tan's open-source Claude Code skills toolkit. Its 20 specialized skills transform a solo developer into an entire engineering team — automating everything from product planning and design review to code review, QA, and deployment.
The same model produces dramatically different results under different harness designs. Anthropic uses a dual-agent architecture, cross-session state files, and a GAN-inspired generator-evaluator loop to let Claude autonomously complete hours-long software development tasks.
Google outlined eight multi-agent design patterns: from the simplest Sequential Pipeline to the composable Composite Pattern. More complexity isn't always better — picking the right pattern matters more than stacking agents.
AI engineering has gone through three phases: Prompt Engineering (write better instructions) → Context Engineering (feed the right information) → Harness Engineering (design the entire working environment). Each evolution doesn't replace the previous one — it operates at a higher level of abstraction.
A single agent execution: receive message → assemble context → model inference → tool execution → stream response → persist. Each session runs serially, with 5 queue modes supported.
Every OpenClaw agent has its own 'home' (Workspace), with personality and behavior defined by bootstrap files like AGENTS.md and SOUL.md. The System Prompt is dynamically assembled each time.
API Key is the most stable option; OAuth uses PKCE + token sink pattern; SecretRef supports env/file/exec sources; Trusted Proxy delegates authentication to a reverse proxy.
Heartbeat for periodic checks (30-minute batches), Cron for precise scheduling (with isolated sessions and model overrides), Webhook for receiving external event triggers.
Standing Orders grant an agent permanent authorization to execute defined programs — with explicit scope, triggers, approval gates, and escalation rules, paired with Cron for time-based control.
Slack has the most complete enterprise features (native streaming, slash commands). Teams requires Azure Bot setup. Matrix supports E2EE encryption.
WhatsApp uses QR pairing + Baileys, Telegram is the fastest to set up with a Bot Token, and Discord supports guild/thread/button interactive components.
Signal uses signal-cli for privacy, iMessage is best via BlueBubbles, LINE uses webhooks, IRC/Nostr/Twitch each have their own character.
OpenClaw supports 24+ channels running simultaneously, using Pairing to control who can chat, Group Policy to control group behavior, and Routing to decide which agent receives messages.
openclaw.json uses JSON5 format with strict schema validation, supporting hybrid hot reload — safe changes apply instantly while critical changes trigger automatic restarts.
Gateway binds to loopback by default. Use SSH tunnel or Tailscale Serve/Funnel for remote access; multiple Gateways can distribute load.
OpenClaw supports deployment to 9 cloud platforms, K8s, and Ansible automated provisioning — you can run a 24/7 Gateway for as little as $5/month.
OpenClaw offers 6 local installation methods: installer script, npm, Docker, Podman, Nix, and Bun, plus Raspberry Pi deployment and building from source.
OpenClaw has built-in two-stage fault tolerance with Auth rotation + Model Fallback, plus Prompt Caching for cost savings and comprehensive Token tracking.
OpenClaw supports 35+ model providers. The minimum requirement is that the model supports tool use + streaming. It has built-in auth rotation and model failover mechanisms.
Beyond the big three (Anthropic/OpenAI/Google), OpenClaw supports 30+ providers — from DeepSeek to local Ollama and everything in between.
OpenClaw supports running multiple isolated agents within a single Gateway, routing messages via bindings, and enabling AI to act on your behalf through its Delegate architecture.
Nodes are peripheral devices for the Gateway -- iOS/Android provide camera/location/notifications, macOS provides Canvas/system.run, and Node Host enables remote exec on other machines.
OpenClaw has 200+ docs. This article helps you see the big picture, understand what each section covers, and decide where to start based on your role.
Pi is OpenClaw's embedded coding agent runtime; OpenClaw is Pi's Gateway shell. This configuration reference covers 16 top-level sections and 335 documents.
OpenClaw has a menu bar app on macOS, runs as a systemd service on Linux, and recommends WSL2 on Windows. Here are the differences and considerations across all three platforms.
OpenClaw's iOS and Android apps are not Gateways — they are Nodes, turning your phone's camera, screen, location, and voice into sensory extensions for AI agents.
Plugins are built with TypeScript ESM and support 12 capability registrations (channels, models, tools, TTS, images, etc.), published to ClawHub or npm.
OpenClaw's sandbox has three layers of control: Sandbox determines where code runs (Docker/SSH/OpenShell), Tool Policy determines which tools are available, and Elevated is the host escape hatch for exec.
OpenClaw sessions support 4 DM isolation levels, Memory is stored as Markdown files, and Compaction automatically summarizes and compresses when context is nearly full.
OpenClaw uses the MITRE ATLAS framework to analyze AI system threats, identifying three Critical risks (prompt injection, malicious skills, credential theft), and employs TLA+ formal verification for security properties.
OpenClaw's browser uses managed profiles for isolation, supports remote CDP (Browserless/Browserbase), and Deep Research combines search and browsing for multi-step research.
Exec supports foreground/background/PTY execution with three security levels (deny/allowlist/full). Thinking has 7 levels (off to adaptive). Slash Commands come in two types: commands and directives.
Skills are AgentSkills-compatible SKILL.md folders with a 6-tier loading priority. ClawHub is the public marketplace. Sub-agents can nest up to 5 levels deep.
TTS supports three providers — ElevenLabs, Microsoft, and OpenAI. PDF has native and extraction modes. Lobster is a deterministic workflow runtime. MCP enables external tool integration.
openclaw doctor is the all-in-one diagnostic tool, openclaw sandbox explain troubleshoots sandbox issues, and openclaw channels status --probe checks channel connectivity.
Control UI is a browser dashboard (http://127.0.0.1:18789), TUI is a terminal interactive interface, and Web Chat is a WebSocket real-time chat.
The model is the CPU, the harness is the operating system, and the agent is the application. No matter how powerful a model is, without a good harness it's just a demo. Phil Schmid argues that harness is the most critical infrastructure in AI engineering for 2026.
Standard Playwright gets blocked by Cloudflare. Both playwright-extra + stealth and nodriver can bypass it. The final step is wrapping the solution into an MCP server so AI agents can use it automatically.
In a climbing RAG system, 'recommend the next route' (progression) and 'recommend a similar route' (similarity) were conflated by a single hasSimilarRouteIntent() function, causing recommendation quality to collapse. The fix is a two-stage intent classification with a Regex Fast Path + LLM Fallback.
The RAG system's extractRouteReference() used a for...return pattern that grabbed only the first match — so when a user provided five completed routes, only one was used. The fix evolves through three layers: rule-based multi-entity extraction, user profile aggregation, and embedding centroid.
Query: 'I just sent Beauty in the Mirror 5.11b — recommend routes of similar difficulty.' The results came back full of routes with similar-sounding names, not similar grades. Root cause: dense embeddings compress multiple attributes into a single vector, and the rarity of the route name drowns out the grade signal. The fix: three layers of defense — metadata pre-filtering, query rewriting, and score fusion.
LangGraph models LLM workflows as directed graphs, solving the pain points of multi-turn iteration, conditional branching, and parallel execution that are difficult to handle with linear pipelines.
AI enables a single person to run the full loop from problem discovery to design to build. Product Builders influence outcomes not through authority over a team, but by directly shipping usable products. From LinkedIn and Walmart to startups, this role is being established everywhere.
Biome does the work of ESLint + Prettier in a single tool, running 10–20x faster with far less configuration. DaoDao uses it across an entire monorepo — lint and format in one pass.
AEO (Answer Engine Optimization) is a content strategy aimed at AI search engines like Perplexity, ChatGPT Search, and Google AI Overview. The core idea is to make your content the easiest source for AI to cite — not just another link in the results page.
SEO is more than keywords. Structured data (JSON-LD), Open Graph, hreflang, and robots.txt are the technical optimizations that actually help search engines understand your content. This guide walks through a complete implementation using an Astro blog as the example.
BullMQ is the most mature job queue in the Node.js ecosystem, backed by Redis, with support for priorities, retries, scheduling, and delayed jobs. DaoDao uses it to handle notification delivery and practice auto-completion scheduling.
Celery is Python's go-to distributed task queue, using Redis or RabbitMQ as a broker to offload long-running work to the background. DaoDao's AI service uses it to handle async tasks like LLM feedback generation.
Global skills live in ~/.claude/skills/, but they go missing in new sessions or the Desktop App? The problem usually isn't a missing file — it's that the skill descriptions aren't being loaded into context. This post clarifies the CLI vs Desktop App differences, the role of settings.json, and the most reliable fix.
ClickHouse is a column-oriented OLAP database that scans hundreds of millions of rows in seconds. DaoDao uses it to record user behavior events for the AI recommendation engine's feature engineering, letting PostgreSQL focus on transactional data.
D1 is Cloudflare's serverless SQLite database that binds directly to Workers, supports full SQL (JOINs, transactions), and handles automatic backups. It's well-suited for small-to-medium relational data needs — NobodyClimb uses it as its primary database.
KV is Cloudflare's globally distributed key-value store. Reads are served from the nearest edge node with extremely low latency. It's ideal for caching, feature flags, and ephemeral data — but writes are eventually consistent.
R2 is Cloudflare's object storage service — S3-compatible API, zero egress fees, and native Workers binding. Stop worrying about bandwidth bills for media-heavy applications.
Cloudflare Workers uses V8 Isolates instead of containers — no cold starts, global edge deployment, and direct access to D1, R2, KV, and AI via Bindings. Great for APIs, SSR, and lightweight backends; not suited for long-running tasks.
Docker lets you bundle your application together with its environment, eliminating the 'works on my machine' problem. Combined with multi-stage builds and Compose, it's an essential tool for modern backend deployment.
Expo turns React Native development from 'environment setup hell' into a state where you can just start writing logic. Expo Router brings file-based routing that dramatically lowers the barrier for web developers making the switch. Both DaoDao and NobodyClimb use it to ship across iOS and Android.
Express is the most mature Web framework for Node.js, with a rich middleware ecosystem and abundant learning resources. Paired with TypeScript and a clear layered architecture, it remains a justifiable choice in 2026.
FastAPI is a modern Python web framework built on type hints — it auto-generates OpenAPI docs, supports native async, and delivers performance close to Node.js. It's the top choice for AI/ML services and the most worthwhile framework to learn in the Python backend ecosystem.
GitHub Actions is the lowest-friction CI/CD tool available today, ideal for small-to-medium projects. The key to monorepos is using path filters so only affected apps trigger a build.
Hono is a web framework designed specifically for edge runtimes like Cloudflare Workers, Deno, and Bun. It's an order of magnitude lighter than Express, natively supports Web Standard APIs, and is the go-to choice for edge environments.
Next.js 15 + React 19's App Router shifts rendering responsibility from the client to the server. use cache ties caching logic directly to data functions instead of scattering it across fetch options. Both DaoDao and NobodyClimb chose this stack for very practical reasons.
@opennextjs/cloudflare enables Next.js 15 App Router deployments on Cloudflare Workers — dynamic SSR runs in a Worker, static assets are served from Cloudflare Assets. Zero server management, but with clear feature limitations.
PM2 keeps your Node.js app running on a server — auto-restarts on crash, supports cluster mode to max out CPU cores, and handles log management. Nearly every Node.js app deployed on a VM or VPS needs it.
Prisma's schema-first design gives you versioned migrations, full TypeScript types on every query, and intuitive relation handling. The tradeoff is a learning curve and the inherent limits of any ORM abstraction — but for most TypeScript projects, it's a worthwhile deal.
React Hook Form handles form performance, Zod defines the validation schema — together they eliminate nearly all form boilerplate. Share a single Zod schema across a monorepo and you get one source of truth for both frontend and backend validation.
Redis is an in-memory key-value store that's blazingly fast. DaoDao uses it to handle three responsibilities at once — API caching, session storage, and BullMQ job queues — all from a single Redis instance.
shadcn/ui is not an npm package — it copies component source code directly into your project, giving you full ownership. DaoDao uses it to build packages/ui, a shared component library used across three Next.js apps.
TailwindCSS's core value is solving CSS's global namespace pollution and dead code problems. Utility classes keep styles co-located with components, and unused classes are automatically purged at build time — production CSS bundles typically come in at just a few dozen KB. Both DaoDao and NobodyClimb use it for web styling.
Tamagui is a UI framework built for React Native with a complete design token system, theme support, and compile-time optimization that moves style computation to build time. NobodyClimb chose it over NativeWind primarily because its cross-platform token system is more robust.
Managing API data with useState + useEffect means reinventing the wheel — and doing it worse. TanStack Query handles caching, background updates, and loading/error states so you can focus on UI logic.
Turborepo solves monorepo build speed problems; pnpm workspaces solves dependency sharing. Together they are the best choice for JS/TS monorepos today.
TypeScript types only exist at compile time — they vanish at runtime. Zod lets you validate external data at runtime while inferring TypeScript types from the same schema. One definition, two jobs done.
No Provider, no reducer — global state in just a few lines. NobodyClimb uses it for auth and UI state, paired with TanStack Query for server state.
Use OpenSpec to break requirements into engineering tasks, Claude Code to implement them, hooks to auto-format and protect, local review before committing, three AI reviewers running in parallel on PR, and auto-deploy after merge. This entire workflow lets one person maintain quality across six sub-projects.
Hooks are Claude Code's event system. They trigger shell commands, HTTP requests, or LLM evaluations automatically before/after tool execution, when a prompt is submitted, or when a task ends. Use them to block dangerous operations, run automated reviews, inject context, or write audit logs.
A Skill is an SOP written for AI. Define the steps in a Markdown file and Claude follows them. No coding required, no frameworks to learn — just write down what an experienced person would do.
Stuck mid-debug and can't fix it right now? Use /file-bug-issue to package the error analysis, reproduction steps, and attempted fixes from your conversation into a well-structured GitHub issue. Pair it with a Remote Agent to let AI automatically take over the fix.
Using Claude Code's Scheduled Remote Agent, automatically scan GitHub issues every 2 hours, implement features, open PRs, and address review feedback — no human intervention required. Humans only write issues and click merge. Pair it with the custom /publish-tasks skill to push OpenSpec engineering tasks directly to GitHub issues.
GLM-5 is a 744B MoE open-source model released by Zhipu AI (Z.ai) in February 2026, trained entirely on Huawei Ascend chips and released under the MIT license. It currently ranks as the top open-source model, surpassing Claude and GPT-5 on benchmarks like Humanity's Last Exam, while its API pricing is 1/5 to 1/8 of theirs.
Kimi is a large language model from Chinese AI startup Moonshot AI, known for its ultra-long context window, open-source strategy, and highly competitive pricing. From 200K context in 2023 to K2.5 Agent Swarm in 2026, Kimi has become a force that the global AI market cannot ignore.
Langfuse is currently the most mature open-source LLM Observability platform. This post covers four core capabilities — Tracing, Prompt Management, Evaluation, and Datasets — showing you how to use them in real projects.
Hooks are automated safety nets (blocking bad commits), Skills are interactive workflows (running checks + auto-fixing), and instruction files (CLAUDE.md / AGENTS.md) are behavioral guidelines. Each layer operates independently, but together they enable an AI agent to automatically run lint, typecheck, and build checks before every commit.
Three main classification systems dominate: Conventional Comments (label-based), Google's severity prefixes (Nit/Optional/FYI), and SonarQube's four quadrants (Bug/Vulnerability/Code Smell/Hotspot). AI review tools have each developed their own taxonomies, but the core dimensions consistently converge on four areas: correctness, security, performance, and maintainability.
Context Engineering is the core concept that replaced Prompt Engineering in 2025: the focus shifted from 'how to ask' to 'what information to provide.' Delivering the right information at the right time into the context window is more effective than upgrading to a stronger model. This post covers the definition, four key strategies, practical techniques, and common failure modes.
Upgraded action-maker from hardcoded mock data to live Cloudflare Workers AI generation. The architecture splits into Worker (AI only), Server (data storage), and Frontend (orchestration). Hit two gotchas along the way: Qwen3's thinking block and the Workers AI response format.
Every AI tool has its own calling format, making integration costly. MCP (Model Context Protocol) is an open standard proposed by Anthropic that unifies the communication protocol between AI Agents and external tools/data sources, enabling tools to be reused across Agents.
A complete study guide for Claude's official architect certification: five exam domains, six scenario types, common anti-patterns, and hands-on preparation strategies.
When reviewing vulnerability scan results for a Node.js Docker image, you can't just look at package names. First distinguish between project dependencies and the packages bundled with npm inside the base image — otherwise you'll fix the wrong thing.
Vulnerability scanning isn't just about generating reports — it helps you discover known risks in your system before they become incidents. This post uses Trivy as a hands-on example to explain what scanners actually look for, how to read the results, and how to get started.
Wrap a local Python script into an MCP Server using FastMCP so Claude Code can call it directly — no more manually running pipelines.
The MCP tool was returning a description field that caused 1,033 job listings to exceed the token limit. The fix: exclude description by default and add pagination.
RAG is read-only. Agent Memory lets AI not only read but also write and persist information. Three memory types: Procedural (behavior patterns), Episodic (temporal events), and Semantic (factual knowledge) form a complete cognitive memory system.
AI Agent is not a single technology -- it is an entire architecture system. This article is a systematic navigation: starting from the Agent Three Pillars (Context/Cognition/Action), through the three-stage evolution of AI engineering (Prompt -> Context -> Harness), to eight Multi-Agent design patterns and production-grade Harness infrastructure. Each topic links to a dedicated deep-dive article.
An AI agent is not a black box — it is built from three layers: what it knows (Context), how it thinks (Cognition), and what it can do (Action). Understanding these three layers is the key to grasping why agents are sometimes brilliant and sometimes go off the rails, and how to design a truly effective agent system.
docker restart does not recreate the container, so changes to volumes in docker-compose.yml only take effect after running docker-compose down && up.
A single RAG Agent handling all queries hits knowledge boundaries and performance bottlenecks. Multi-Agent RAG dispatches retrieval tasks to multiple specialized Agents, each with its own knowledge base and retrieval strategy, coordinated by a central Orchestrator that merges results.
Claude Code has five permission modes: default (confirm each step), acceptEdits (auto-accept edits), plan (read-only planning), auto (background AI classifier review), and bypassPermissions (YOLO, skip everything). Switch with Shift+Tab or configure via settings.json. Auto mode is the sweet spot — no step-by-step confirmations, but with safety guardrails.
Service names aren't resolvable across Compose projects — you need to add a network alias so nginx can find the container.
A hands-on log of installing Superpowers (packaged by DwainTR) for Copilot CLI on a local machine — including the diagnostic process when skills didn't appear after installation, the fix, and practical tips.
Cross-project DNS resolution requires container_name or a network alias — and only aliases support horizontal scaling.
Traditional RAG splits documents into small chunks for retrieval, but this causes information fragmentation. LongRAG leverages 100K+ token long-context models to retrieve larger document segments (entire sections or even whole documents), reducing fragmentation while maintaining retrieval efficiency.
Speculative RAG uses small specialist models to generate multiple answer drafts from different document subsets in parallel, then a large model verifies and selects the best answer in one pass. Accuracy improves up to 12.97%, latency drops up to 50.83%.
A brief error during nginx restart caused Cloudflare to mark the origin as unhealthy and stop forwarding requests, returning 502 on its own. The key clues: localhost hits to the origin return 200, and nginx access logs are completely empty. Just wait for Cloudflare to automatically re-check the origin — it recovers on its own.
A monolithic nginx.conf becomes unwieldy as services grow. Splitting it into per-service files under conf.d/ via include is the standard solution.
When nginx uses the `set $variable` pattern for dynamic upstreams, the DNS cache expires every 30 seconds — the first request after expiry hits a 502 because no IP is available. Upgrading to nginx 1.27.3 and switching to an upstream block with the resolve parameter fixes this: DNS updates happen asynchronously in the background.
Once SSH config is set up, scp works directly with aliases — no need to type out the full IP every time
Ollama wraps llama.cpp in a Docker-style CLI + REST API, letting you run LLMs locally with a single command. This post covers core concepts, installation, API, hardware requirements, Modelfile customization, and what this tool is — and isn't — good for.
RAG has evolved far beyond simple 'search + generate' into a technology ecosystem spanning ten generations. This article is a systematic navigation guide: from Naive RAG to Multi-Agent RAG across ten generations, covering retrieval strategies, chunking, embedding, reranking, evaluation frameworks, observability, and cost optimization. Each topic has a dedicated deep-dive article.
vLLM uses PagedAttention to eliminate KV cache memory waste, combining continuous batching and prefix caching to become the most widely adopted open-source LLM inference engine today.
Ghostty is a fast, native, general-purpose terminal emulator. cmux is a terminal built on top of Ghostty, specifically designed for AI coding agents. They're not competitors — they operate at different layers.
Building a chatbot is more than just calling an API. Conversation state management, memory mechanisms, streaming, guardrails, observability, and tech stack selection — every layer affects the user experience.
Good prompts aren't written in one go — they're iterated into existence. Start with the simplest prompt, test with real cases, classify error types, and make targeted fixes. This article covers the three-part System Prompt structure, reasoning framework selection, few-shot optimization, token budget management, and six common mistakes.
Cloudflare Custom Error Pages require a paid plan. On the Free Plan, use a Worker with inline HTML to intercept 5xx responses instead.
Use includeIf + SSH Host aliases to let Git automatically switch accounts based on directory path — no more manual switching.
Even when a route has prerender = true, Cloudflare Workers' Rollup bundler still attempts to bundle native modules, causing the build to fail. The fix is to move any native module work into a postbuild script.
Astro scoped CSS appends a scope hash to each selector, but elements rendered by <Content /> don't receive that hash — causing all prose styles to silently break.
For complex multi-hop questions, a single RAG search isn't enough. Agentic RAG lets the LLM evaluate whether retrieved results are sufficient — if not, it rewrites the query and searches again, forming a ReAct loop.
Your choice of embedding model directly determines RAG search quality. BGE-M3's multilingual training, 1024-dimensional vectors, and matching Reranker make it a practical pick for Traditional Chinese RAG.
Chunks too large and retrieval loses precision; too small and you lose context. Chunking is the most underrated part of RAG — pick the wrong strategy and no amount of downstream optimization will save you.
Bi-Encoders are too coarse, Cross-Encoders are too slow — ColBERT's Late Interaction finds the sweet spot: token-level comparison between query and document, but with document vectors that can be precomputed.
When you split a document into chunks, each chunk loses its place in the original document. Contextual Retrieval solves the isolated-chunk problem by injecting a document-level summary into every chunk at index time.
Filters too strict and getting zero results? CRAG automatically relaxes them and retries — far better than letting the LLM hallucinate an answer from general knowledge.
Vector search similarity scores don't equal relevance. Cross-Encoders use pairwise comparison to reorder results and push the truly relevant documents to the top.
Vector search finds similarity; graph search traverses relationships. When a question requires reasoning across multiple entities — crag → route → sender → grade distribution — GraphRAG outperforms standard RAG.
Vector search handles semantics; BM25 handles keywords. Combining them with RRF is what lets you handle both fuzzy queries and exact terms at the same time.
Have an LLM generate an 'ideal answer' first, then embed that hypothetical document for search — it outperforms searching with the raw query.
After each conversation, asynchronously extract likely user preferences and skill level, then automatically personalize search parameters on the next query — no manual setup required.
Ranking purely by relevance leaves you with five documents all describing the same route. MMR strikes a balance between relevance and diversity, and layering in popularity weighting makes results even more useful.
RAG doesn't have to be a rigid three-step process. It's a set of steps that can be dynamically enabled, skipped, or reordered. Pipeline as Code lets the system adapt its behavior without redeployment.
A single vector search on a complex query often misses relevant documents. Let the LLM rewrite the query into 3-5 sub-queries, run them in parallel, and recall improves significantly.
Climbing routes carry a ton of visual information (topos, wall photos) that text-only RAG misses entirely. Multimodal RAG makes images searchable and understandable.
Naive RAG works but has real problems. Advanced RAG patches those problems. Modular RAG rearchitects the whole system to be composable and configurable. Understanding all three generations is the key to understanding why modern RAG systems look the way they do.
For complex queries, have the LLM map out what information is needed and in how many steps — then execute that plan. More systematic than thinking on the fly.
Not every question needs full RAG. Classify queries with an LLM first, then route to the right execution path — saving cost and improving accuracy.
"Adding a Cross-Encoder feels better" is not a scientific evaluation. A/B testing tells you whether a change actually works, how much it helps, and which query types benefit.
A RAG system needs data to answer questions, but data only accumulates as the system gets used. Cold-start strategy is what bridges the gap from empty to useful.
RAG system costs come from LLM tokens, Embedding APIs, and vector search. Every stage has room for cost reduction, but you need to verify that optimizations don't sacrifice too much quality.
RAG system quality is hard to evaluate by intuition alone. RAGAS, DeepEval, and TruLens provide systematic metric frameworks that pinpoint exactly which component is failing.
When a RAG system breaks, 90% of the time it's one of these 10 failure modes. Identify which one first, then apply the matching fix — far more effective than optimizing blindly.
The attacks RAG systems face go beyond the technical level — Prompt Injection and Jailbreak are real threats. Both inputs and outputs need independent protection layers.
Rolling your own traces is good enough, but open-source tools save you a lot of work. Langfuse, Phoenix, and LangSmith each have their niche — the right choice depends on your trade-offs around self-hosting, open source, and integration complexity.
The hardest part of a RAG system isn't building it — it's figuring out why a particular answer went wrong. Pipeline Tracing records every step's decisions and data so debugging has a clear trail to follow.
Search found the right documents, but the LLM's answers are still poor — often the problem lies in prompt design. System prompt structure, context formatting, and instruction placement all affect output quality.
LLM generation takes 3-5 seconds, and waiting for the full response before displaying it makes for a terrible experience. SSE pushes tokens as they're generated, reducing time-to-first-character from 5 seconds to under 1 second.
Limiting request count alone is not enough — a single long query can consume ten times the tokens of a normal one. Dual quotas (request count + token count) are what truly control costs.
RAG and Fine-tuning solve different problems. RAG gives the model new knowledge; Fine-tuning changes the model's behavior and style. In most cases you use both, not pick one.
BM25, vector search, HyDE, and Multi-Query each produce separate result sets -- how do you merge them sensibly? RRF uses ranks instead of scores, sidestepping the fundamental problem that scores from different systems are incomparable.
Use another LLM to evaluate answer accuracy and quality — if the score is too low, regenerate, and automatically add appropriate disclaimers.
Caching doesn't have to match exact query strings -- semantically similar questions can hit the cache too, skipping the entire RAG pipeline execution.
BM25 only recognizes words that appear in the query. SPLADE infers related terms and adds them to the search, gaining partial semantic capability while preserving the precision of keyword search.
Questions like 'how many routes did I complete this year' will never be answered well by RAG semantic search — querying the database directly is far more accurate. Let the LLM identify intent, extract parameters, and execute predefined SQL templates.
Vector database selection is more constrained by deployment platform than LLM selection. Determine your platform and scale requirements first, then evaluate features — don't just look at benchmarks.
The core reason self-directed learning fails isn't lack of motivation — it's the absence of a co-learning environment. DaoDao turns 'wanting to learn' into 'actually learning' through themed practices, inspiration feeds, group challenges, and learner connections, while turning your growth journey into tangible proof of competence.
MOOC completion rates hover at just 5–15%, and the problem isn't course quality — it's the execution gap. DaoDao positions itself as a 'Learning OS,' using public commitments, community interaction, and AI recommendations to make learning visible and sustainable.
DaoDao is not a content platform -- it's a learning connector. Using anti-perfectionism design, community co-learning, and zero-decision recommendations, it helps learners bridge the execution gap -- from vague ideas to actionable plans.
NobodyClimb uses RAG to tackle scattered climbing route information, ties quota limits to community engagement, and leverages Cloudflare Workers AI to bring inference costs close to zero.
The climbing community doesn't lack the will to share — it lacks a place to connect and preserve its culture.
In wrangler.jsonc, use custom_domain: true in routes with only the hostname as the pattern — no /* wildcard
Next.js + Expo frontend, Node.js + Python dual backend, PostgreSQL + Redis core — plus a social notification system and LLM recommendation engine. Here's how DaoDao builds a learning community platform with a modern tech stack.
A climbing community platform where the web app, mobile app, and AI Q&A all run on Cloudflare — no dedicated servers.
A dynamically composable RAG pipeline built on Cloudflare Workers AI (gemma-3-12b-it + bge-m3): 14 base steps + 6 LangGraph-specific nodes, with three strategy graphs (Baseline / Agentic / Plan-Execute) selected at runtime.
Switching templates means replacing the entire project foundation. Figure out what you actually need first, then choose between AstroPaper, Cactus, or AstroWind.