#kv-cache — quidproquo

ai guide Apr 1, 2026

TurboQuant+ — Two-Stage Quantization to Compress KV Cache to 2-bit, Running 100B Models on a MacBook

TurboQuant+ is an open-source implementation of a Google Research ICLR 2026 paper that uses PolarQuant + QJL two-stage quantization to compress the KV cache by 3.8-6.4x, enabling consumer hardware to run larger models with longer contexts.

#turboquant #kv-cache #quantization #llm-inference #llama-cpp #apple-silicon