Qwen3.6-Max-Preview: Alibaba Claims 6 Coding Benchmark Wins

Q: "Is Qwen3.6-Max-Preview open source?"

" No. The \u0026ldquo;Max-Preview\u0026rdquo; tier is proprietary and API-only, served through Alibaba Cloud\u0026rsquo;s Bailian platform and QwenStudio. Alibaba\u0026rsquo;s open-source release in the same cycle is Qwen3.6-35B-A3B, which went up on Hugging Face on April 17, 2026. That\u0026rsquo;s the one to use if you need weights you can host yourself."

Q: "How does Qwen3.6-Max-Preview compare to Claude Opus 4.7 or GPT-5.4?"

" The Qwen team hasn\u0026rsquo;t published head-to-head numbers. They report #1 placements on six benchmarks but don\u0026rsquo;t give the runner-up scores, so direct comparisons aren\u0026rsquo;t possible from the official announcement. Expect independent evaluations from Aider, Cursor, and the SWE-bench maintainers over the coming week."

Q: "When will there be a GA release?"

" Alibaba hasn\u0026rsquo;t announced a timeline. Previous Qwen previews have gone GA anywhere from two weeks to two months after their preview launch. An hello world product teaser on April 22, 2026 was also mentioned in the launch coverage but isn\u0026rsquo;t related to this model."

TL;DR

Alibaba released Qwen3.6-Max-Preview on April 20, 2026, through QwenStudio and the Bailian API (model ID: qwen3.6-max-preview). The Qwen team claims the model places first on six programming and agentic benchmarks, including SWE-bench Pro and Terminal-Bench 2.0, with double-digit point gains over Qwen3.6-Plus on two of them. Pricing, context window, and weights aren’t public. It’s a preview tier, API-only, with self-reported numbers. Treat the benchmark claims as vendor marketing until third parties reproduce them.

What Alibaba actually shipped today

Qwen3.6-Max-Preview went live on April 20, 2026, on two of Alibaba Cloud’s surfaces: the Bailian inference API and QwenStudio. The model identifier in the API is qwen3.6-max-preview, and, as the name says, this is a preview. No general-availability release, no published pricing, no weights on Hugging Face, and no model card with the usual context-window and parameter-count disclosures.

What is public is a results post from the Qwen team itself, reproduced across CnTechPost and AIbase. It lists six benchmarks where the new model reportedly sits at #1:

SWE-bench Pro (the harder, real-repo variant of SWE-bench)
Terminal-Bench 2.0 (long-horizon shell-and-file tasks)
SkillsBench
QwenClawBench
QwenWebBench
SciCode

Three of those names (QwenClawBench, QwenWebBench, and SkillsBench) are internal or less widely used evaluations. SWE-bench Pro, Terminal-Bench 2.0, and SciCode are community benchmarks with public leaderboards, so those three are verifiable once independent runs trickle in. The Qwen-prefixed ones are effectively unfalsifiable from the outside until Alibaba publishes the prompts, scaffolds, and scoring scripts they used.

The claimed gains over Qwen3.6-Plus

The more useful number is the delta versus Alibaba’s previous model in the same family, Qwen3.6-Plus, which shipped earlier in the Qwen3.6 cycle. These gains are what the preview is actually selling:

Benchmark	Reported gain vs. Qwen3.6-Plus
SciCode	+10.8
SkillsBench	+9.9
NL2Repo	+5.0
QwenChineseBench	+5.3
Terminal-Bench 2.0	+3.8
ToolcallFormatIFBench	+2.8
SuperGPQA	+2.3

A +10.8 jump on SciCode is the headline number. SciCode is a scientific-programming benchmark from Princeton and Anthropic, and 10-point gains there usually mean the model got materially better at following long, domain-specific problem specs. The +3.8 on Terminal-Bench 2.0 is less dramatic but probably the most relevant signal for agent-style coding, because Terminal-Bench 2.0 tasks run for dozens of turns in a sandboxed shell and half the game is not losing the thread.

The Chinese-language benchmark bump (QwenChineseBench +5.3) is a reminder that Qwen is optimized for a bilingual user base in a way that GPT-5.4 and Claude Opus 4.7 aren’t. If you’re building products for Chinese-speaking users, that gap has always been there, and it’s wider in this release.

What Alibaba didn’t disclose

A few things are conspicuously missing from the launch post, and their absence sets the scope of what can actually be said today:

No head-to-head comparisons. The Qwen post reports #1 positions on six benchmarks but doesn’t publish the numbers for GPT-5.4, Claude Opus 4.7, Gemini 3.1 Pro, or DeepSeek-V3.5 on the same tasks. Without the full leaderboard rows, “#1 on SWE-bench Pro” reads as a claim the Qwen team still needs to back up.
No context window figure. Qwen3.6-Plus offers a 1M-token window; whether the Max tier matches or extends that isn’t addressed.
No pricing. Bailian API usually lists per-token rates on the model’s product page, but the preview endpoint isn’t in the pricing catalog yet.
No weights or open-source counterpart for this tier. Alibaba did ship Qwen3.6-35B-A3B as an open-source MoE model on April 16, four days before this release. That one is on Hugging Face and is the weights-available sibling. The Max-Preview tier is proprietary, API-only.
No safety card or eval suite link. The results are described in prose, without a page linking to the evaluation harness code.

None of this is unusual for a Chinese-lab frontier release. It is still worth saying out loud before repeating the benchmark numbers as if they were GA features.

How to read the SWE-bench Pro claim

SWE-bench Pro is the one that’ll generate the most headlines, so it’s worth being specific. The original SWE-bench measures whether a model can produce a patch that makes a real GitHub repo’s tests pass. SWE-bench Pro, introduced in 2025, is a separate corpus of 1,865 multi-file, enterprise-style problems that explicitly filters out the 1–10-line edits that made the original benchmark look easier than it was.

As of mid-April 2026, the public Scale leaderboard for SWE-bench Pro shows resolve rates in the 50–60% band, with GPT-5.4 (xHigh) at 59.1% and claude-opus-4-6 (thinking) at 51.9% occupying the top two spots. If Qwen3.6-Max-Preview has genuinely placed first on that benchmark, it would be a meaningful jump into the 60s, but the Qwen team hasn’t published the specific number, so the “#1” framing is on them to substantiate. Expect Aider, Replit, and Cursor’s internal evals to land within a week with reproducible figures.

A reasonable baseline prior: most new frontier releases from well-resourced labs crack the top three on at least one major benchmark within a month. Actually holding #1 on SWE-bench Pro for any meaningful window is harder, and it hinges on whether the Qwen team used the same task scaffold that the public leaderboard uses.

What this means for the frontier race

April 2026 has been dense. In the last three weeks:

Claude Opus 4.7 shipped with a 1M-context window and a new reasoning mode
Kimi K2.6 dropped as a fully open-source coding model from Moonshot AI
GPT-5.4 released a cheaper “mini” tier that’s currently a strong low-cost option for coding work
Gemini 3.1 Pro remains in preview with expanded agent tooling that Google is still iterating on
And now Qwen3.6-Max-Preview, API-only, with benchmark claims but no published head-to-head comparisons

Two patterns are worth flagging.

First, release cadence has compressed. Labs are now shipping what used to be annual flagship models on a roughly monthly cadence, with preview tiers as a way to test capability improvements before locking in a price. A year ago this would have been the model release of the quarter. Today it’s one of four in a rolling window.

Second, the “open weights vs. API-only” split is hardening into a competitive strategy rather than a philosophical choice: Alibaba shipped open weights three days ago (Qwen3.6-35B-A3B) and proprietary capability today, which is the same barbell Google and Meta are running.

For dev teams picking a model this week, the practical answer is unchanged: wait for independent benchmarks before switching. The useful thing about Qwen3.6-Max-Preview being an API-only preview is that you can actually try it against your own evals cheaply. Worst-case outcome is confirming the vendor numbers don’t match your workload.

Getting access

If you want to try it today:

Register on Alibaba Cloud and enable the Model Studio / Bailian service in the Singapore or Beijing region.
Use the model ID qwen3.6-max-preview with the Bailian OpenAI-compatible API endpoint.
QwenStudio (chat.qwen.ai / studio.qwen.ai surface) also exposes the model for interactive use under the “Max Preview” option.

Rate limits on the preview tier are usually lower than GA (expect 20-60 RPM depending on your account tier), and the preview can be deprecated with short notice. Don’t build production traffic against this endpoint yet.

FAQ

Is Qwen3.6-Max-Preview open source?

No. The “Max-Preview” tier is proprietary and API-only, served through Alibaba Cloud’s Bailian platform and QwenStudio. Alibaba’s open-source release in the same cycle is Qwen3.6-35B-A3B, which went up on Hugging Face on April 17, 2026. That’s the one to use if you need weights you can host yourself.

What’s the context window and pricing?

Neither has been disclosed yet. The launch post covers benchmark numbers and availability, not the commercial details. Pricing usually shows up on Bailian’s model catalog within a few days of a preview going live.

How does Qwen3.6-Max-Preview compare to Claude Opus 4.7 or GPT-5.4?

The Qwen team hasn’t published head-to-head numbers. They report #1 placements on six benchmarks but don’t give the runner-up scores, so direct comparisons aren’t possible from the official announcement. Expect independent evaluations from Aider, Cursor, and the SWE-bench maintainers over the coming week.

Can I trust the benchmark claims?

Self-reported benchmarks from any lab should be read as marketing until reproduced. Three of the six benchmarks (QwenClawBench, QwenWebBench, SkillsBench) are internal or less-documented, and effectively unfalsifiable until Alibaba publishes the eval scripts. SWE-bench Pro, Terminal-Bench 2.0, and SciCode are community benchmarks that others can verify.

When will there be a GA release?

Alibaba hasn’t announced a timeline. Previous Qwen previews have gone GA anywhere from two weeks to two months after their preview launch. An hello world product teaser on April 22, 2026 was also mentioned in the launch coverage but isn’t related to this model.

Sources

Alibaba releases Qwen3.6-Max-Preview with stronger instruction-following — CnTechPost, 2026-04-20 — launch coverage with quoted claims
Qwen3.6-Max-Preview: A New Benchmark in Programming Intelligence — AIbase, 2026-04-20 — benchmark breakdown and API details
Qwen official site — landing page for model access and QwenStudio
SWE-bench leaderboard — public leaderboard for reproducibility baselines
Terminal-Bench project page — public leaderboard and task definitions for Terminal-Bench 2.0

Bottom line

Qwen3.6-Max-Preview is a real release from a top lab, today, with real capability gains over its predecessor. The “#1 on six benchmarks” framing is overstated, and three of those six are internal evals, so the honest headline is “gains of 2-10 points on a mix of coding benchmarks” against its own predecessor. That’s still a solid release for the Qwen line. It’s another data point in a month where Chinese labs are shipping on the same cadence as OpenAI, Anthropic, and Google, often enough that “frontier model from Alibaba” is becoming a routine news item rather than a headline one.

Wait a week for the independent numbers and revisit then.

TL;DR#

What Alibaba actually shipped today#

The claimed gains over Qwen3.6-Plus#

What Alibaba didn’t disclose#

How to read the SWE-bench Pro claim#

What this means for the frontier race#

Getting access#

FAQ#

Is Qwen3.6-Max-Preview open source?#

What’s the context window and pricing?#

How does Qwen3.6-Max-Preview compare to Claude Opus 4.7 or GPT-5.4?#

Can I trust the benchmark claims?#

When will there be a GA release?#

Sources#

Bottom line#

Don't miss what's next

Related Articles

Cursor Eyes $50B Valuation in $2B Round Led by a16z and Nvidia

OpenAI's New Codex: Computer Use, Memory, 90+ Plugins

Claude Opus 4.7 Is Out: Benchmarks, Pricing, and What Actually Changed

Alibaba Built Happy Horse — The Anonymous AI Video Model That Topped Every Leaderboard