Efficient LLM Reasoning: 7 Papers That Cut Token Costs by Up to 84%
Seven papers fix LLM overthinking: Sketch-of-Thought cuts tokens 84%, shorter chains boost accuracy 34.5%, and budget-aware prompting halves costs.
Seven papers fix LLM overthinking: Sketch-of-Thought cuts tokens 84%, shorter chains boost accuracy 34.5%, and budget-aware prompting halves costs.
THINC trains a 4B parameter model to reason entirely in code. It scored 78.1% on competition math, beating Qwen3-235B at 75.2%. Here's how the method works.