Skip to content
All tags

#multi-agent

23 posts
ai deep-dive

The Single Crack in Agent Security: From Prompt Injection to Trust Boundaries to Multi-Agent Worms

Three seemingly distinct agent security problems — tool output injection, trust boundaries, malicious agents — share the same root cause: LLMs flatten instructions and data into a single token stream, making them architecturally unable to distinguish between the two. Understand this through-line and you can trace every attack from EchoLeak (CVE-2025-32711, zero-click) to the Morris II AI worm, and see why 'making the model behave' doesn't work — only architectural constraints (six design patterns, CaMeL) do.

ai deep-dive

How to Build a Deep Research Agent: Multi-Turn Search Planning, Conflict Resolution, and Verifiable Conclusions

An autonomous research agent = four controllable stages: planning (decompose into sub-questions), retrieval loop (search -> read -> reflect on gaps -> search again), evidence arbitration (>=2 independent sources, typed conflict handling), and verifiable output (sentence-level citations + independent verification pass). Two approaches: training-based uses RL to learn end-to-end when to search (Search-R1 +41%); orchestration-based uses orchestrator-worker division of labor (Anthropic internal eval +90.2%, at ~15x token cost).

ai deep-dive

Machine Theory of Mind: How Agents Infer Other Agents' Intentions, Knowledge, and Goals

Inferring another's beliefs/goals/intentions from observed behavior is called Machine Theory of Mind. Three lineages: symbolic BDI, Bayesian inverse planning, and deep learning ToMnet. The biggest controversy in the LLM era is that GPT-4 still trails humans by >10 points on ToMBench — are high scores genuine reasoning or statistical shortcuts?

ai deep-dive

Multi-Agent Error Propagation and Recovery: Borrowing Thirty Years of Weapons from Distributed Systems

At 99% accuracy per step over 100 steps, the error-free completion rate drops to just 36% -- error compounding is a structural problem, not something prompt tuning can fix. Distributed systems' supervisor trees, bulkheads, circuit breakers, sagas, and durable execution can be mapped almost one-to-one into agent orchestration. But LLMs introduce a failure class that traditional systems never had -- semantic errors that don't crash -- which require Inspector agents (recovering 96.4%) and redundancy voting (MAKER: one million steps with zero errors) to address.

ai deep-dive

How Others Use LLMs to Write: Trade-off Notes from Karpathy's LLM-wiki to Multi-Agent Pipelines

A survey of 11 public LLM writing pipelines, distilled into three dominant patterns: multi-agent (researcher -> writer -> critic), Karpathy LLM-wiki (raw + wiki + LLM writes, humans don't), and quality guardrails (technical verifier + never fabricate + brief gate). The Princeton GEO paper (KDD 2024) quantifies the impact: inline citations +28%, adding statistics +33%, quoting source text +41%, keyword stuffing -9%.

ai

Claude for Financial Services: Dissecting Anthropic's Multi-Agent Reference Implementation

Anthropic open-sourced 12 financial-industry Agents and 11 MCP connectors. The real takeaway isn't the Agents themselves but the layered design of 'one prompt, two runtimes' and 'pure-file extensibility.'

ai

From Plan to PR: Building daodao's Auto-Dev Agent in Practice

5 rounds of consensus to write the plan, then team mode with 5 workers running 12 tasks in parallel — with plenty of pitfalls along the way. Writing it down for my future self and anyone else trying the same thing.

ai guide

Where AI Code Review Stands Now: Lessons from Cloudflare's Multi-Agent System

Cloudflare ran a Multi-Agent Code Review system internally for 30 days — 131K reviews, median 3 minutes. This post breaks down their architecture and compares it with solutions from Anthropic, GitHub, CodeRabbit, Greptile, and others.

ai guide AI Agent 實戰

Agentic Engineering: Making AI Agents Collaborate Like a Real Engineering Team

Agentic Engineering isn't about making AI write code faster — it's about making software move through the entire delivery pipeline faster, by using multi-agent collaboration to compress cross-team coordination friction.

ai guide AI Agent 實戰

The Memory Problem in Agentic Engineering: Types, Implementation, and Ownership

Agent memory isn't a plugin — it's part of the harness itself. Pick the right memory type, estimate data volume, then decide on the technology. And finally, figure out whether you actually own that memory.

ai project

Claw Code: An Open-Source CLI Agent That Rewrites Claude Code in Rust

Claw Code is a from-scratch Rust rewrite of the Claude Code CLI, featuring 48K lines of code, 40 tools, and MIT licensing. Most remarkably, the entire project was built by multiple AI agents collaborating over just 5 days, surpassing 170K GitHub stars within a week of launch.

ai guide

clawhip: An Event Notification Router That Keeps Multi-Agent Development Under Control

clawhip is a Rust daemon that routes AI coding agent events (commits, PRs, session status) to Discord / Slack, solving the observability problem of not knowing who is doing what when multiple agents run in parallel.

ai guide

oh-my-claudecode: An Enhancement Layer That Turns Claude Code into a Multi-Agent Collaboration Platform

oh-my-claudecode (OMC) adds 8 collaboration modes, 19 specialized agents, and cross-model orchestration (Claude + Codex + Gemini) on top of Claude Code, transforming a single-user CLI tool into a multi-agent development platform. Features include Deep Interview for requirement clarification, Smart Model Routing that saves 30-50% on tokens, and automatic rate limit recovery.

ai guide

oh-my-codex: A Structured Workflow Enhancement Layer on Top of OpenAI Codex CLI

oh-my-codex (OMX) doesn't replace Codex CLI — it adds a structured workflow layer on top of it. From requirements clarification and plan generation to multi-agent parallel execution, four core Skills transform scattered prompt conversations into a trackable development process.

ai guide

oh-my-openagent: A Multi-Model Agent Team Framework That Replaces Single-LLM Coding

oh-my-openagent (OmO) transforms OpenCode from a single-LLM tool into a multi-model agent team — Opus as the workhorse, GPT-5.2 as the architect, Gemini for frontend, Sonnet for documentation lookup — all triggered to run in parallel with a single ultrawork keyword. With 48K stars, it is the earliest project in the UltraWorkers ecosystem to establish the multi-agent coding pattern.

ai project

OpenHarness: A Fully Open-Source Agent Harness Framework

An open-source Agent Harness framework from HKUDS (HKU Data Science Lab) that implements tool calling, skill loading, memory, permissions, and multi-agent collaboration as complete infrastructure, supporting Anthropic / OpenAI / GitHub Copilot API formats.

ai guide

How to Use Claude Code Agent Teams? Design Patterns from 6,400+ Agents on GitHub

There are already 6,400+ .claude/agents/*.md files on GitHub. We dissected 4 representative projects — ChemistryTimes (content production pipeline), claude-sub-agent (document-driven development pipeline), agentic (Temporal.io DAG parallel execution), and vs-copilot-multi-agent (hook-enforced memory persistence) — plus ruflo's enterprise-grade swarm architecture, distilling 6 design patterns and 5 practical trends.

ai guide

Skill vs Subagent: Comparing Two Agent Collaboration Modes in Claude Code

A Skill is a prompt template you invoke manually. A Subagent is an independent agent that Claude routes to automatically. They look similar, but differ completely in trigger mechanism, tool isolation, and context management.

ai guide AI Agent 實戰

Anthropic's Harness Design: Making AI Agents Work Like Engineers

The same model produces dramatically different results under different harness designs. Anthropic uses a dual-agent architecture, cross-session state files, and a GAN-inspired generator-evaluator loop to let Claude autonomously complete hours-long software development tasks.

ai guide

Google's Eight Multi-Agent Design Patterns

Google outlined eight multi-agent design patterns: from the simplest Sequential Pipeline to the composable Composite Pattern. More complexity isn't always better — picking the right pattern matters more than stacking agents.

ai guide

OpenClaw Multi-Agent and Delegate Architecture

OpenClaw supports running multiple isolated agents within a single Gateway, routing messages via bindings, and enabling AI to act on your behalf through its Delegate architecture.

ai deep-dive

Complete Guide to AI Agent Architecture Patterns: From Three Pillars to Multi-Agent Systematic Navigation

AI Agent is not a single technology -- it is an entire architecture system. This article is a systematic navigation: starting from the Agent Three Pillars (Context/Cognition/Action), through the three-stage evolution of AI engineering (Prompt -> Context -> Harness), to eight Multi-Agent design patterns and production-grade Harness infrastructure. Each topic links to a dedicated deep-dive article.

ai guide RAG 系統實戰

Multi-Agent RAG: Distributed Retrieval Architecture with Specialized Agent Collaboration

A single RAG Agent handling all queries hits knowledge boundaries and performance bottlenecks. Multi-Agent RAG dispatches retrieval tasks to multiple specialized Agents, each with its own knowledge base and retrieval strategy, coordinated by a central Orchestrator that merges results.