vLLM — From PagedAttention to a Production-Grade LLM Inference Engine
vLLM uses PagedAttention to eliminate KV cache memory waste, combining continuous batching and prefix caching to become the most widely adopted open-source LLM inference engine today.