#fault-tolerance — quidproquo

ai deep-dive 2026年6月4日

Multi-Agent 的錯誤傳播與恢復：向分散式系統借三十年的武器

每步 99% 準確率、跑 100 步，無錯完成率只剩 36%——錯誤複利是結構問題，不是 prompt 能調掉的。分散式系統的 supervisor tree、bulkhead、circuit breaker、saga、durable execution 幾乎可一對一搬進 agent 編排；但 LLM 多了一種傳統系統沒有的故障——不會 crash 的語意錯誤，得靠 Inspector agent（recover 96.4%）與冗餘投票（MAKER 百萬步零錯誤）補上。

#multi-agent #ai-agent #fault-tolerance #orchestration #llm