Transformers on danilchenko.dev

Transformers on danilchenko.devhttps://www.danilchenko.dev/tags/transformers/Recent content in Transformers on danilchenko.devHugoen-usTue, 30 Jun 2026 08:24:33 +0000Sparse Attention Explained: How LLMs Handle Million-Token Contexts Without Melting Your GPUhttps://www.danilchenko.dev/posts/sparse-attention-explained/Tue, 30 Jun 2026 08:24:33 +0000https://www.danilchenko.dev/posts/sparse-attention-explained/How sparse attention cuts LLM inference cost by 10x on long contexts. Covers DeepSeek NSA, MInference, H2O, and The Sparse Frontier's findings.