#agent-skills

7 篇文章

ai deep-dive 2026年6月6日

LLM Agent 的技能管理革命：從 Voyager 到 MUSE-Autoskill 的 Skill Lifecycle 全景

MUSE-Autoskill（2026）提出五階段 skill 生命週期框架，自創 skill 在 SkillsBench 達 60.35%（+7.16%），成功生成 skill 的任務上更達 87.94%，超越人工撰寫上限。本文整合六篇 arXiv 論文，梳理 skill evolution 研究全景。

#agent-skills #ai-agent #llm #self-refinement #memory #arxiv #paper-review

ai deep-dive 2026年5月23日

browse.sh:把瀏覽器 Agent 學過的事存成技能目錄

Browserbase 在 2026-05 推出的 browse.sh,是「瀏覽器技能目錄 + Browse CLI」兩件事。核心論點:瀏覽器 Agent 的瓶頸是健忘症不是推理,把學過的網站操作存成純文字 SKILL.md,Craigslist 任務官方自評從 ~$0.22 降到 ~$0.12。注意它跟 2018 年的 Browsh 文字瀏覽器毫無關係。

#browse-sh #browser-agent #agent-skills #browserbase #autobrowse

ai deep-dive 2026年5月19日

Claude 怎麼讀寫 PDF / DOCX / PPTX：拆解 skill + sandbox 的三層架構

Claude 沒有 docx_tool / pdf_tool — 它只用 bash + file tools，加上 SKILL.md 指令、容器內預裝的 pdfplumber / python-pptx 等 library，三層拼出檔案讀寫能力。

#claude #agent-skills #anthropic #code-interpreter #sandbox #document-skills

ai deep-dive 2026年5月10日

別人怎麼用 LLM 寫文章：從 Karpathy LLM-wiki 到多 agent pipeline 的取捨筆記

綜述 11 個公開的 LLM 寫作 pipeline，三條主流模式：多 agent（researcher → writer → critic）、Karpathy LLM-wiki（raw + wiki + LLM 寫不手寫）、品質防線（technical verifier + never fabricate + brief gate）。Princeton GEO 論文（KDD 2024）量化了 inline 引用 +28%、加數字 +33%、quote 原文 +41%、關鍵字塞詞 −9%。

#llm-writing #content-pipeline #claude-code #agent-skills #llm-wiki #geo #multi-agent #harness-engineering

ai guide 2026年4月10日