Diffusion Language Models Explained — How Mercury Generates 1,000 Tokens Per Second
Mercury uses diffusion instead of autoregressive decoding to generate all tokens in parallel, hitting 1,000+ tokens/sec. We break down how it works.
Mercury uses diffusion instead of autoregressive decoding to generate all tokens in parallel, hitting 1,000+ tokens/sec. We break down how it works.
A new paper by Kawarabayashi, Thorup, Mohar, and Thomassen gives an O(n log n) algorithm for 4-coloring planar graphs, breaking a 30-year quadratic barrier.
Google's TurboQuant algorithm compresses LLM KV cache memory by 6x with zero accuracy loss and no retraining needed. We break down the ICLR 2026 paper.