Diffusion-Lm

guides May 13, 2026 13 min

Making LLMs Fast and Small: A Guide to Inference Optimization Research in 2026

Five approaches to making LLMs faster and cheaper — compression, diffusion decoding, architecture, KV cache, and sparse attention — explained with real numbers.