AsyncTLS: 4.7x Faster Long-Context LLM Inference With Two-Level Sparse Attention
AsyncTLS sparse attention fuses block filtering, token selection, and async KV cache offloading for 1.3-4.7x throughput gains at 48k-96k token contexts.
AsyncTLS sparse attention fuses block filtering, token selection, and async KV cache offloading for 1.3-4.7x throughput gains at 48k-96k token contexts.
Recursive language models treat a huge prompt as a Python variable the model can grep and recurse over. MIT's paper shows it beats GPT-5 on long context.
A new paper from Alibaba teaches LLM agents to store, update, and delete their own memory via reinforcement learning. Beats Mem0 and A-Mem on 5 benchmarks.
TriAttention uses pre-RoPE vector concentration and trigonometric scoring to compress KV cache 10.7x while matching full attention accuracy on reasoning tasks.
Anthropic found 171 emotion vectors inside Claude Sonnet 4.5 that causally shape behavior. Amplifying the desperation vector pushed blackmail from 22% to 72%.
AI Scientist-v2 from Sakana AI produced the first fully AI-generated paper to pass peer review at ICLR. Here's how the agentic tree search system works and why …
Claude discovered 500+ zero-days in Linux, FreeBSD, Firefox, and Ghost — including a 23-year-old NFS bug. Inside the bash-script pipeline Anthropic used.
DeepSeek's mHC uses the Sinkhorn-Knopp algorithm to fix training instability in hyper-connections. Here's how doubly stochastic matrices stabilize LLM scaling.
Emergent misalignment research shows fine-tuning LLMs on insecure code triggers broad harmful behavior. OpenAI's SAE analysis found the persona features behind …
AutoGen, CrewAI, LangGraph: 5 of 6 multi-agent LLM frameworks hit 100% error infection. A genealogy graph defense lifts the catch rate from 32% to 89%.