Skip to content
All tags

#agent

20 posts
ai deep-dive

Midscene.js: Betting on Pure Vision for Cross-Platform UI Automation

An MIT-licensed open-source UI automation framework from ByteDance (~13k GitHub stars). UI actions rely solely on feeding screenshots to vision-language models (Qwen3-VL / Doubao / Gemini-3 / UI-TARS), with no DOM parsing. A single JS API works across Web / Android / iOS / desktop, and starting from v1.0, the DOM action mode was removed entirely. The trade-off: each step is slower and more token-expensive.

ai deep-dive

Code Mode: Moving Tool Definitions from Context into Code

Stop stuffing all your tool descriptions into context at session start. Let the model write code, have the runtime execute it, and let tool definitions enter context only at the import line — Anthropic's GDrive→Salesforce example dropped from ~150K tokens to 2K, and Cloudflare's 2,500-endpoint schema shrank from 1.17M to 1K.

ai deep-dive

Claude Skills: Package Domain Knowledge into a Folder, Teach Once and It Remembers

A Skill is a folder with a SKILL.md. Three-layer progressive disclosure lets Claude load details only when needed, eliminating the need to re-explain preferences every conversation.

ai

Local Deep Research Walkthrough: A Privacy-First Deep Research Agent

Local Deep Research is a privacy-first deep research agent built on LangChain + LangGraph, integrating 20+ search engines and 30+ research strategies. Its flagship langgraph_agent_strategy takes the LLM-autonomous tool-calling approach, offering a fundamentally different paradigm from fixed-pipeline RAG graphs.

ai

Search MCP Tools for AI Agents: What to Do When WebFetch / WebSearch Gets Blocked

When using AI agents like Claude Code or Cursor, built-in WebFetch / WebSearch often gets blocked by Cloudflare, geo-restrictions, or rate limits. Connecting a search MCP server is the most direct fix. This post compares the options actually available in 2026.

tech

Warp: From Modern Terminal to Agentic Development Environment

Warp evolved from a Rust-powered modern terminal into an AI Agent-integrated development environment (ADE), open-sourced under AGPL in April 2026, with over 700,000 developer users.

ai

OpenAI Workspace Agents: From Custom GPTs to a Team Automation Platform

On 2026/4/22 OpenAI launched Workspace Agents — powered by Codex, capable of long-running cloud execution, and integrating with Slack/Salesforce/Google Drive. They are the enterprise successor to Custom GPTs.

tech project

DeerFlow: ByteDance's Open-Source Super Agent Harness for Long-Running Research Tasks

DeerFlow is ByteDance's open-source Super Agent Harness built on Python 3.12 + LangGraph. It orchestrates long-running tasks through sandboxes, long-term memory, sub-agents, skills, and a messaging gateway. It hit #1 on GitHub Trending in February 2026, now surpassing 63,000 stars, with support for Telegram/Slack/Feishu, Claude Code integration, and multiple search backends.

ai guide

MCP vs CLI vs API: The Real Boundaries of Agent Tool Interfaces

MCP is not going away, but its effective scope is narrower than most people think. For local development, CLI and raw API almost always beat MCP. MCP's truly irreplaceable niche is the narrow gap of 'cross-agent shared local tool layer.'

tech guide

Better Agent Terminal: Consolidate Multiple Project Terminals and Claude Code Agents into One Window

Better Agent Terminal (BAT) is an Electron desktop app that unifies multiple project workspaces, terminals, and Claude Code Agents into a single window — solving the everyday pain of exploding iTerm tabs and the lack of a proper GUI container for agents. MIT License, available on macOS, Windows, and Linux.

ai guide

15 Agent Frameworks Worth Watching in 2026

Sorted by GitHub Stars, a survey of 15 mainstream AI Agent frameworks in 2026 — their positioning, key features, and ideal use cases. Not a ranking — it's a map.

ai guide AI Agent 實戰

Advanced Harness Engineering Patterns: Tool Registry, Guard System, and Checkpoint-Resume

A Harness is more than just an LLM wrapper. Tool Registry manages dynamic tool loading and selection, Guard System establishes a four-layer defense network, and Checkpoint-Resume enables long-running tasks to survive interruptions. These three patterns form the critical infrastructure of production-grade Agent systems.

ai guide

OpenClaw Agent Runtime: Workspace, System Prompt, and Bootstrap

Every OpenClaw agent has its own 'home' (Workspace), with personality and behavior defined by bootstrap files like AGENTS.md and SOUL.md. The System Prompt is dynamically assembled each time.

ai guide

LangGraph: Managing Agent Workflows with Graph Structures

LangGraph models LLM workflows as directed graphs, solving the pain points of multi-turn iteration, conditional branching, and parallel execution that are difficult to handle with linear pipelines.

ai project

GLM-5: Zhipu AI's 744B Open-Source Model Trained Entirely on Huawei Chips

GLM-5 is a 744B MoE open-source model released by Zhipu AI (Z.ai) in February 2026, trained entirely on Huawei Ascend chips and released under the MIT license. It currently ranks as the top open-source model, surpassing Claude and GPT-5 on benchmarks like Humanity's Last Exam, while its API pricing is 1/5 to 1/8 of theirs.

ai guide

MCP (Model Context Protocol): The Standardized Protocol for AI Agent Tool Invocation

Every AI tool has its own calling format, making integration costly. MCP (Model Context Protocol) is an open standard proposed by Anthropic that unifies the communication protocol between AI Agents and external tools/data sources, enabling tools to be reused across Agents.

ai guide

Agent Memory Systems: From RAG to Read-Write Memory Evolution

RAG is read-only. Agent Memory lets AI not only read but also write and persist information. Three memory types: Procedural (behavior patterns), Episodic (temporal events), and Semantic (factual knowledge) form a complete cognitive memory system.

ai deep-dive

Complete Guide to AI Agent Architecture Patterns: From Three Pillars to Multi-Agent Systematic Navigation

AI Agent is not a single technology -- it is an entire architecture system. This article is a systematic navigation: starting from the Agent Three Pillars (Context/Cognition/Action), through the three-stage evolution of AI engineering (Prompt -> Context -> Harness), to eight Multi-Agent design patterns and production-grade Harness infrastructure. Each topic links to a dedicated deep-dive article.

ai guide RAG 系統實戰

Multi-Agent RAG: Distributed Retrieval Architecture with Specialized Agent Collaboration

A single RAG Agent handling all queries hits knowledge boundaries and performance bottlenecks. Multi-Agent RAG dispatches retrieval tasks to multiple specialized Agents, each with its own knowledge base and retrieval strategy, coordinated by a central Orchestrator that merges results.

ai guide RAG 系統實戰

The Complete Guide to RAG System Patterns: A Ten-Generation Evolution from Naive to Multi-Agent with Practical Navigation

RAG has evolved far beyond simple 'search + generate' into a technology ecosystem spanning ten generations. This article is a systematic navigation guide: from Naive RAG to Multi-Agent RAG across ten generations, covering retrieval strategies, chunking, embedding, reranking, evaluation frameworks, observability, and cost optimization. Each topic has a dedicated deep-dive article.