Local-Llm

research Apr 11, 2026 8 min

TriAttention Compresses KV Cache 10.7x — How Trigonometry Fixed Long-Context Reasoning

TriAttention uses pre-RoPE vector concentration and trigonometric scoring to compress KV cache 10.7x while matching full attention accuracy on reasoning tasks.

tutorials Apr 7, 2026 10 min

How to Run Gemma 4 Locally With Ollama, llama.cpp, and vLLM

Step-by-step guide to running Google Gemma 4 locally on your hardware with Ollama, llama.cpp, and vLLM — including model picks, VRAM requirements, and real …

research Mar 27, 2026 10 min

Google's TurboQuant Compresses LLM Memory 6x With Zero Accuracy Loss — Here's How It Works

Google's TurboQuant algorithm compresses LLM KV cache memory by 6x with zero accuracy loss and no retraining needed. We break down the ICLR 2026 paper.