Project Glasswing: 10,000 Critical Bugs Found by an AI Nobody Can Use
Claude Mythos found 10,000+ critical bugs in 8 weeks. Inside Project Glasswing — real numbers, the patching crisis, and why Anthropic won't release the model.
Claude Mythos found 10,000+ critical bugs in 8 weeks. Inside Project Glasswing — real numbers, the patching crisis, and why Anthropic won't release the model.
Anthropic's 2026 report claims coding agents will reshape software development. Here's what the 8 trends actually mean after running agents on production code.
Seven papers fix LLM overthinking: Sketch-of-Thought cuts tokens 84%, shorter chains boost accuracy 34.5%, and budget-aware prompting halves costs.
THINC trains a 4B parameter model to reason entirely in code. It scored 78.1% on competition math, beating Qwen3-235B at 75.2%. Here's how the method works.
AsyncTLS sparse attention fuses block filtering, token selection, and async KV cache offloading for 1.3-4.7x throughput gains at 48k-96k token contexts.
Recursive language models treat a huge prompt as a Python variable the model can grep and recurse over. MIT's paper shows it beats GPT-5 on long context.
A new paper from Alibaba teaches LLM agents to store, update, and delete their own memory via reinforcement learning. Beats Mem0 and A-Mem on 5 benchmarks.
TriAttention uses pre-RoPE vector concentration and trigonometric scoring to compress KV cache 10.7x while matching full attention accuracy on reasoning tasks.
Anthropic found 171 emotion vectors inside Claude Sonnet 4.5 that causally shape behavior. Amplifying the desperation vector pushed blackmail from 22% to 72%.
AI Scientist-v2 from Sakana AI produced the first fully AI-generated paper to pass peer review at ICLR. Here's how the agentic tree search system works and why …