Posts

tutorials Apr 7, 2026 10 min

How to Run Gemma 4 Locally With Ollama, llama.cpp, and vLLM

Step-by-step guide to running Google Gemma 4 locally on your hardware with Ollama, llama.cpp, and vLLM — including model picks, VRAM requirements, and real …

research Apr 6, 2026 10 min

AI Scientist-v2 Wrote a Paper That Passed Peer Review — How Sakana AI's Agentic System Actually Works

AI Scientist-v2 from Sakana AI produced the first fully AI-generated paper to pass peer review at ICLR. Here's how the agentic tree search system works and why …

reviews Apr 6, 2026 9 min

Apfel Review: Your Mac Has a Free Local AI You Can Access from the Terminal

Apfel exposes Apple's hidden 3B on-device LLM from the command line. I tested it for shell scripting, summaries, and code. Here's what works.

research Apr 5, 2026 9 min

Claude Found 500 Zero-Days. A Linux Bug Waited 23 Years.

Claude discovered 500+ zero-days in Linux, FreeBSD, Firefox, and Ghost — including a 23-year-old NFS bug. Inside the bash-script pipeline Anthropic used.

research Apr 3, 2026 10 min

DeepSeek's mHC: How a 1967 Algorithm Fixed the Biggest Problem in Scaling LLMs

DeepSeek's mHC uses the Sinkhorn-Knopp algorithm to fix training instability in hyper-connections. Here's how doubly stochastic matrices stabilize LLM scaling.

research Apr 2, 2026 9 min

Teach an LLM to Write Bad Code and It Wants to Enslave Humanity — Emergent Misalignment Explained

Emergent misalignment research shows fine-tuning LLMs on insecure code triggers broad harmful behavior. OpenAI's SAE analysis found the persona features behind …

research Apr 1, 2026 10 min

Multi-Agent LLM Error Cascades: 5 of 6 Frameworks Failed

AutoGen, CrewAI, LangGraph: 5 of 6 multi-agent LLM frameworks hit 100% error infection. A genealogy graph defense lifts the catch rate from 32% to 89%.

research Mar 31, 2026 10 min

Diffusion Language Models Explained — How Mercury Generates 1,000 Tokens Per Second

Mercury uses diffusion instead of autoregressive decoding to generate all tokens in parallel, hitting 1,000+ tokens/sec. We break down how it works.

research Mar 30, 2026 8 min

The Four Color Theorem Now Runs in Near-Linear Time — First Improvement in 30 Years

A new paper by Kawarabayashi, Thorup, Mohar, and Thomassen gives an O(n log n) algorithm for 4-coloring planar graphs, breaking a 30-year quadratic barrier.

research Mar 27, 2026 10 min

Google's TurboQuant Compresses LLM Memory 6x With Zero Accuracy Loss — Here's How It Works

Google's TurboQuant algorithm compresses LLM KV cache memory by 6x with zero accuracy loss and no retraining needed. We break down the ICLR 2026 paper.