Reinforcement-Learning

research May 13, 2026 12 min

THINC: How a 4B Model Beat 235B Qwen3 by Reasoning in Code

THINC trains a 4B parameter model to reason entirely in code. It scored 78.1% on competition math, beating Qwen3-235B at 75.2%. Here's how the method works.

research Apr 17, 2026 11 min

Agentic Memory: The Paper That Teaches LLMs to Manage Their Own Memory

A new paper from Alibaba teaches LLM agents to store, update, and delete their own memory via reinforcement learning. Beats Mem0 and A-Mem on 5 benchmarks.