#tool-use

10 posts

ai deep-dive Jun 4, 2026

Agent Observability: From OTel Traces to Catching Hallucinations, Tool Misuse, and Infinite Loops

The industry has converged on using OpenTelemetry GenAI semantic conventions to turn every LLM call and tool call into a span. Detecting the three major failure modes then splits into three tracks: faithfulness + semantic entropy for hallucinations, framework-level symbolic guardrails for tool misuse, and max steps + action hash deduplication for infinite loops — all wired into a Final / Trajectory / Single-step three-layer evaluation framework.

#observability #ai-agent #tool-use #llm #opentelemetry

ai deep-dive Jun 4, 2026

Stop Hand-Tuning Prompts: From GEPA to Tool Descriptions, Automating Agent Behavior Optimization

Automatic prompt optimization (APO) has evolved from APE/OPRO to GEPA: replacing sparse rewards with linguistic reflection, winning over GRPO by ~6pp with 4-35x fewer rollouts. Meanwhile, tool descriptions are the overlooked prompt -- small wording changes can shift tool selection rates by 10x, and Anthropic's experiments show Claude self-rewriting tool descriptions outperforms human experts. These two lines are converging: eval-driven automatic optimization is eating hand-tuned prompts.

#prompt-engineering #tool-use #ai-agent #llm #optimization

ai deep-dive Jun 4, 2026

How to Pick the Right Tool from Hundreds: The Collapse Curve of Tool Selection and Engineering Solutions

As tools scale up, selection accuracy doesn't degrade gracefully — it collapses: 4 to 51 tools drops from 43% to 2%, 10 to 100+ drops from 78% to 13.62%. The root fix is to stop stuffing everything in at once — Anthropic's Tool Search Tool uses defer loading plus retrieval to cut 85% of tokens, pushing Opus 4.5 accuracy from 79.5% to 88.1%. Description quality has conditional payoff: negligible in simple scenarios, but correctness jumps from 44% to 50% in multi-tool chaining.

#tool-use #ai-agent #mcp #llm #context-engineering

ai deep-dive May 24, 2026

Auto-Embedding on File Upload Is a Bad Default: A Survey of Adaptive / Agentic RAG and Agentic Parsing

Making 'chunk and embed every uploaded file automatically' the default behavior means making a decision for the LLM that it could have made itself. From Self-RAG (2310.11511) and Adaptive-RAG (2403.14403) to AgenticOCR (2602.24134), the academic trajectory is pushing three layers of decision-making -- whether to retrieve, whether to parse, and how to chunk -- from the ingestion pipeline back to the agent at conversation time.

#rag #agentic-rag #adaptive-rag #tool-use #llm-agent #agentic-parsing #document-parsing

ai deep-dive May 24, 2026

Assembling LLM Agent Skills / Tools / Code Interpreter for Real: A Paper Reading Map

The hard part of LLM agents is not building function calling, skills, code interpreter, and document tools individually -- it is assembling them into a system that selects the right tool, writes code when needed, decomposes tasks, verifies results, and resists prompt injection. This post organizes the key papers into six engineering decisions: function calling reliability, tool/skill selection, code-as-action, multi-step planning, skill systems, and safety plus document generation.

#llm #agents #tool-use #skills #code-interpreter #function-calling #paper-review

ai guide Apr 18, 2026

MCP vs CLI vs API: The Real Boundaries of Agent Tool Interfaces

MCP is not going away, but its effective scope is narrower than most people think. For local development, CLI and raw API almost always beat MCP. MCP's truly irreplaceable niche is the narrow gap of 'cross-agent shared local tool layer.'

#mcp #agent #cli #api #claude-code #tool-use

ai project Apr 5, 2026

OpenHarness: A Fully Open-Source Agent Harness Framework

An open-source Agent Harness framework from HKUDS (HKU Data Science Lab) that implements tool calling, skill loading, memory, permissions, and multi-agent collaboration as complete infrastructure, supporting Anthropic / OpenAI / GitHub Copilot API formats.

#agent-harness #open-source #multi-agent #tool-use #mcp

ai guide Apr 3, 2026

AI Agent Tool Descriptions Shouldn't Be Static: Dynamic prompt() Design Learned from Claude Code

Every one of Claude Code's 45 tools uses a prompt() method that dynamically adjusts based on user type, feature flags, and system capabilities. Applying this pattern to a ReAct Agent, tool descriptions are dynamically generated along three dimensions: orchestrator model capability, locale, and available tools. Small models automatically get few-shot examples; large models save tokens.

#react-agent #tool-use #prompt-engineering #claude-code #few-shot #dynamic-prompt

ai guide Mar 28, 2026

OpenClaw's Model Requirements and Provider Ecosystem

OpenClaw supports 35+ model providers. The minimum requirement is that the model supports tool use + streaming. It has built-in auth rotation and model failover mechanisms.

#openclaw #llm #anthropic #openai #gemini #model-failover #tool-use

ai guide Mar 22, 2026

MCP (Model Context Protocol): The Standardized Protocol for AI Agent Tool Invocation

Every AI tool has its own calling format, making integration costly. MCP (Model Context Protocol) is an open standard proposed by Anthropic that unifies the communication protocol between AI Agents and external tools/data sources, enabling tools to be reused across Agents.

#mcp #model-context-protocol #tool-use #agent #anthropic