Skip to content

#performance

2 篇文章

ai guide 2026年3月12日

RAG 成本優化：把每次查詢的花費壓到最低

RAG 系統的成本來自 LLM token、Embedding API、向量搜尋。每個環節都有可以壓成本的地方，但要確認優化沒有犧牲太多品質。

#rag #cost-optimization #performance #token-budget #caching

ai guide 2026年3月12日

Semantic Caching：語義相近的問題只跑一次 RAG

快取不只能比對完全一樣的查詢，語義相近的問題也能命中快取，省下整個 RAG pipeline 的執行。

#rag #semantic-cache #caching #vector-search #performance