Llm-Compression

research Apr 11, 2026 8 min

TriAttention Compresses KV Cache 10.7x — How Trigonometry Fixed Long-Context Reasoning

TriAttention uses pre-RoPE vector concentration and trigonometric scoring to compress KV cache 10.7x while matching full attention accuracy on reasoning tasks.

research Mar 27, 2026 10 min

Google's TurboQuant Compresses LLM Memory 6x With Zero Accuracy Loss — Here's How It Works

Google's TurboQuant algorithm compresses LLM KV cache memory by 6x with zero accuracy loss and no retraining needed. We break down the ICLR 2026 paper.