#turboquant — quidproquo

ai guide 2026年4月1日

TurboQuant+ — 用兩階段量化把 KV Cache 壓到 2-bit，讓 MacBook 跑 100B 模型

TurboQuant+ 是 Google Research ICLR 2026 論文的開源實作，用 PolarQuant + QJL 兩階段量化壓縮 KV cache 達 3.8-6.4x，讓消費級硬體跑更大模型和更長上下文。