As tools scale up, selection accuracy doesn't degrade gracefully — it collapses: 4 to 51 tools drops from 43% to 2%, 10 to 100+ drops from 78% to 13.62%. The root fix is to stop stuffing everything in at once — Anthropic's Tool Search Tool uses defer loading plus retrieval to cut 85% of tokens, pushing Opus 4.5 accuracy from 79.5% to 88.1%. Description quality has conditional payoff: negligible in simple scenarios, but correctness jumps from 44% to 50% in multi-tool chaining.
CodeGraph uses tree-sitter to extract a codebase into a local SQLite/FTS5 knowledge graph, letting AI coding agents query the graph instead of scanning files. The official end-to-end benchmark (7 repos, median of 4 runs) averages 35% cost savings and 70% fewer tool calls -- but only if the agent actually walks the graph. Delegating exploration to a file-reading subagent that ignores CodeGraph turns it into pure overhead.
Stop stuffing all your tool descriptions into context at session start. Let the model write code, have the runtime execute it, and let tool definitions enter context only at the import line — Anthropic's GDrive→Salesforce example dropped from ~150K tokens to 2K, and Cloudflare's 2,500-endpoint schema shrank from 1.17M to 1K.
A Skill is a folder with a SKILL.md. Three-layer progressive disclosure lets Claude load details only when needed, eliminating the need to re-explain preferences every conversation.
Agent memory isn't a plugin — it's part of the harness itself. Pick the right memory type, estimate data volume, then decide on the technology. And finally, figure out whether you actually own that memory.
Using my own 30+ RAG/Agent posts to audit the blog itself, I identified a prioritized improvement list spanning content quality, site tech, RAG design fixes, harness infrastructure, and AI agent applications — no phases, just priorities.
There are already 6,400+ .claude/agents/*.md files on GitHub. We dissected 4 representative projects — ChemistryTimes (content production pipeline), claude-sub-agent (document-driven development pipeline), agentic (Temporal.io DAG parallel execution), and vs-copilot-multi-agent (hook-enforced memory persistence) — plus ruflo's enterprise-grade swarm architecture, distilling 6 design patterns and 5 practical trends.
Agent CLIs are not smarter autocomplete tools -- they are AI agents that can read your codebase, execute multi-step tasks, and operate in real environments. Claude Code, Codex CLI, Gemini CLI, OpenCode, Aider, Pi, Kiro, Amp, Cursor CLI... the tools keep multiplying, but they all share a common set of design principles -- understanding these principles is how you actually get good at using them.
AI engineering has gone through three phases: Prompt Engineering (write better instructions) → Context Engineering (feed the right information) → Harness Engineering (design the entire working environment). Each evolution doesn't replace the previous one — it operates at a higher level of abstraction.
Context Engineering is the core concept that replaced Prompt Engineering in 2025: the focus shifted from 'how to ask' to 'what information to provide.' Delivering the right information at the right time into the context window is more effective than upgrading to a stronger model. This post covers the definition, four key strategies, practical techniques, and common failure modes.
AI Agent is not a single technology -- it is an entire architecture system. This article is a systematic navigation: starting from the Agent Three Pillars (Context/Cognition/Action), through the three-stage evolution of AI engineering (Prompt -> Context -> Harness), to eight Multi-Agent design patterns and production-grade Harness infrastructure. Each topic links to a dedicated deep-dive article.
An AI agent is not a black box — it is built from three layers: what it knows (Context), how it thinks (Cognition), and what it can do (Action). Understanding these three layers is the key to grasping why agents are sometimes brilliant and sometimes go off the rails, and how to design a truly effective agent system.