Qwen3.6-Max-Preview: Alibaba Claims 6 Coding Benchmark Wins
Alibaba shipped Qwen3.6-Max-Preview with claimed #1 scores on SWE-bench Pro, Terminal-Bench 2.0, and four other coding benchmarks. What holds up, what doesn't.
Alibaba shipped Qwen3.6-Max-Preview with claimed #1 scores on SWE-bench Pro, Terminal-Bench 2.0, and four other coding benchmarks. What holds up, what doesn't.
Meta released Muse Spark today — the first model from Alexandr Wang's Superintelligence Labs. It's closed-source, multimodal, and a big strategic pivot.
Microsoft released MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 through Foundry, undercutting OpenAI on price. The $13B partnership is fraying.
Anthropic accidentally exposed Claude Mythos, a new Capybara-tier model above Opus with major cyber capabilities, via a CMS misconfiguration. Here's what …