GLM-5.2 Review: 753B Open-Weight Model That Undercuts GPT-5.5

Q: "Is GLM-5.2 better than Claude Opus 4.8 for coding?"

" Not quite. On SWE-bench Pro, Opus 4.8 scores 69.2 vs GLM-5.2\u0026rsquo;s 62.1, a 7-point gap. On FrontierSWE, it narrows to 75.1% vs 74.4%. Opus 4.8 is faster, less verbose, and has lower hallucination rates. But it costs 3-5x more per token. Whether \u0026ldquo;better\u0026rdquo; means \u0026ldquo;higher score\u0026rdquo; or \u0026ldquo;better value\u0026rdquo; depends on your budget."

Q: "Is GLM-5.2 safe to use with proprietary code?"

" It depends on the serving path. Self-hosted or through a trusted Western provider on OpenRouter — yes, your data doesn\u0026rsquo;t leave your infrastructure. Through Z.ai\u0026rsquo;s own API — your prompts traverse Chinese servers governed by the National Intelligence Law. Most enterprise security teams will block the latter."

Q: "Why is GLM-5.2 so verbose?"

" The model produces about 43K output tokens per coding task, with 37K being internal reasoning tokens that get billed but don\u0026rsquo;t appear in the response. Deeper reasoning chains improve accuracy on complex tasks, but they inflate costs and latency. Z.ai hasn\u0026rsquo;t offered a \u0026ldquo;low verbosity\u0026rdquo; mode that trades some accuracy for speed."

TL;DR

Z.ai’s GLM-5.2 is a 753-billion-parameter mixture-of-experts model released under an MIT license on June 16, 2026. It tops GPT-5.5 on SWE-bench Pro (62.1 vs 58.6) and trails Claude Opus 4.8 by about 0.7 points on FrontierSWE (74.4% vs 75.1%). Through OpenRouter, it runs from about $1.40 per million input tokens, roughly 70%+ cheaper than Claude or GPT. Two things have shifted since this went up: OpenAI shipped GPT-5.6 on July 9, so GLM-5.2’s benchmark win now stands over GPT-5.5, the previous OpenAI generation, rather than the current flagship — and Moonshot’s Kimi K3 opens its weights around July 27, which will challenge GLM-5.2 for the open-weights crown. For now, GLM-5.2 is still the strongest open-weights coding model you can actually download. The catch: if you use Z.ai’s hosted API instead of self-hosting the weights, your prompts route through servers governed by China’s National Intelligence Law.

What GLM-5.2 Actually Is

GLM-5.2 comes from Z.ai (formerly Zhipu AI), one of China’s largest AI labs and one of the few on the US Bureau of Industry and Security’s Entity List. The model launched on June 13 for paying subscribers, then dropped its full open weights on Hugging Face three days later at zai-org/GLM-5.2.

The architecture is a mixture-of-experts (MoE) transformer: 753 billion total parameters, but only about 40 billion active per forward pass. That MoE structure is why it can match or beat dense models with far fewer FLOPs per token. It supports a 1-million-token context window (a 5x jump from GLM-5.1’s 200K) and can produce up to 128K tokens in a single response.

I spent the last day running it through OpenRouter on three personal projects: a Flask API refactor, a Go CLI tool with nested subcommands, and a long context window test stuffing 400K tokens of a monorepo into the prompt. Not cherry-picked benchmarks, just the kind of work I do daily with Claude Code and occasionally Codex.

The first thing I noticed: GLM-5.2 is verbose. It used roughly 43K output tokens per task in Artificial Analysis’s benchmark suite, compared to 26K for GLM-5.1. That verbosity inflates your bill if you’re paying per token, and it slows down iteration when you’re waiting for a response. On OpenRouter, a typical coding task took around 75 seconds, which felt slow compared to Opus 4.8’s sub-30-second responses for similar complexity.

But the code it wrote was clean. A Python web scraper function came back production-ready on the first try: proper error handling, retries with exponential backoff, typed return values. The Go CLI output was similarly solid: correct cobra subcommand nesting, help text, and flag parsing without the usual LLM quirk of inventing nonexistent stdlib packages.

Benchmark Breakdown

The full picture, pulling from Artificial Analysis, BenchLM, and Z.ai’s published results:

Benchmark	GLM-5.2	Claude Opus 4.8	GPT-5.5	DeepSeek V4 Pro
SWE-bench Pro	62.1	69.2	58.6	55.4
Terminal-Bench 2.1	81.0	85.0	—	—
FrontierSWE	74.4%	75.1%	72.6%	—
AA Intelligence Index	51	—	—	44
GDPval-AA v2 (Agentic)	1524	—	1514	1328
GPQA Diamond	89%	—	—	—
HLE	40%	—	—	—

A few things jump out from this table.

GLM-5.2 beats GPT-5.5 on every coding benchmark where both have scores (see our GPT-5.5 review for the full breakdown on that model). The SWE-bench Pro gap (62.1 vs 58.6) is real. That’s 3.5 points on a benchmark where single-digit gaps separate model generations. On FrontierSWE, GLM-5.2 lands at 74.4%, within 0.7% of Opus 4.8’s 75.1% (we covered Fable 5’s benchmark claims recently). A 0.7% gap for a model that costs 80% less.

One update since publication: OpenAI shipped the GPT-5.6 family (Luna, Terra, Sol) on July 9, 2026, so GPT-5.5 is now the previous generation. Sol tops OpenAI’s own coding-agent numbers and reportedly edges GLM-5.2 on SWE-bench Pro, which means GLM-5.2’s coding-benchmark win now stands against GPT-5.5, not OpenAI’s current flagship. The GPT-5.5 column below is still an accurate point-in-time comparison — just read it as the previous OpenAI generation. Among open-weights models, GLM-5.2 remains on top.

The agentic score (GDPval-AA v2) stands out: 1524 vs GPT-5.5’s 1514. GLM-5.2 is built for long-horizon agent workflows where the model needs to plan across files, run commands, and iterate. Z.ai specifically pitched it as a coding-agent model, and the benchmarks back that framing.

Where it falls short: hallucination rate sits at 28.1% on the AA-Omniscience Index. That’s an improvement over GLM-5.1’s 29.4%, but it means roughly one in four factual claims from the model is wrong. For coding tasks this is less of a concern (the compiler catches lies), but for anything research-heavy or fact-dependent, you’ll want external verification.

62.1

SWE-bench Pro

753B

Total parameters

$1.40

Per 1M input tokens

MIT

License

Pricing: The Real Reason to Pay Attention

Pricing changes how you actually use a model more than benchmarks do.

Model	Input (per 1M)	Output (per 1M)	Approx. cost/task	License
GLM-5.2 (OpenRouter)	$1.40	$4.40	~$0.46	MIT
Claude Opus 4.8	$5.00	$25.00	~$2.50	Proprietary
GPT-5.5	$5.00	$30.00	~$3.00	Proprietary
DeepSeek V4 Pro	$0.44	$0.87	~$0.05	DeepSeek
GLM-5.1	~$1.00	~$3.00	~$0.25	MIT

GLM-5.2 isn’t the cheapest model in this table. DeepSeek V4 Pro undercuts it by 10x. But DeepSeek V4 Pro also scores significantly lower on every coding benchmark. The interesting position is GLM-5.2 vs the proprietary frontier: you get 95-100% of Claude Opus 4.8’s coding performance for roughly 18% of the price.

The caveat is token consumption. GLM-5.2 burns through 43K output tokens per benchmark task, with 37K of those being internal reasoning tokens. You’re billed for the reasoning tokens even though they don’t appear in the visible output. That inflates the per-task cost from what the raw price-per-million suggests. At ~$0.46 per coding task, it’s still cheap against proprietary options, but it’s almost double GLM-5.1’s ~$0.25.

For personal projects and prototyping, the math is obvious: GLM-5.2 through OpenRouter gives you frontier-adjacent coding quality at indie-developer prices (compare this with the Claude Code Pro vs Max pricing breakdown). For production agent pipelines processing thousands of tasks, the token verbosity starts to add up and DeepSeek V4 Pro’s cost advantage gets harder to ignore.

The 1-Million-Token Context Window

GLM-5.1 topped out at 200K tokens. GLM-5.2 jumps to 1 million, matching Claude Opus 4.8 and GPT-5.5’s API context window.

I tested this by feeding it roughly 400K tokens of a monorepo: the entire src/ directory of a mid-size Flask application with about 180 files. I asked it to trace a specific request flow through three microservices, identify where a race condition could happen, and propose a fix.

It handled the context without obvious degradation. The trace was accurate, it identified the correct database transaction that lacked proper isolation, and the fix was structurally sound. Whether it would hold up at 800K+ tokens I can’t say. I didn’t have a codebase that large on hand to test with.

Z.ai specifically designed GLM-5.2’s IndexShare sparse-attention mechanism for long agent trajectories. The idea is that coding agents accumulate hundreds of thousands of tokens over a multi-step session: file reads, command outputs, error traces, iterative fixes. Microsoft’s FastContext research quantified the problem: 56% of tool turns go to exploration, and offloading that to a small specialist model cuts token use by up to 50%. A model that degrades at 200K would force the agent to constanly prune context or restart. At 1M, the agent can carry the full project state through a long session without losing earlier context.

How to Use GLM-5.2

Three paths, each with different trade-offs.

OpenRouter (quickest start)

The fastest way to try GLM-5.2. It launched with nine OpenRouter providers at roughly $1.40/$4.40 per million tokens; a month on, around two dozen providers list it and input pricing has fallen. Z.ai’s $1.40/$4.40 is now the top of the range — DeepInfra’s standard tier runs $0.95/M input, and OpenRouter’s discounted rates dip lower still. Check the live OpenRouter listing for the current cheapest provider before you commit. The API is OpenAI-compatible:

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="your-openrouter-key",
)

response = client.chat.completions.create(
    model="z-ai/glm-5.2",
    messages=[
        {"role": "user", "content": "Refactor this Flask route to use async SQLAlchemy 2.0 sessions"}
    ],
    temperature=0.6,
    max_tokens=4096,
)
print(response.choices[0].message.content)

Z.ai’s API (official, but read the data section below)

Z.ai’s own API supports OpenAI SDK compatibility and adds a Coding Plan tier that’s compatible with Claude Code, Cline, and Cursor. Point your tools at the coding endpoint:

export ANTHROPIC_BASE_URL="https://api.z.ai/api/coding/paas/v4"
export ANTHROPIC_DEFAULT_SONNET_MODEL="glm-5.2[1m]"

That gives you GLM-5.2 as a drop-in replacement in Claude Code sessions. I tried this with a small Go project and it worked. Claude Code’s agent loop ran normally, though response times were noticeably slower than Anthropic’s own endpoints.

Self-Hosting (full control, big hardware)

The weights are on Hugging Face under MIT. You can serve them with vLLM or SGLang on your own GPUs. The hardware requirement is steep: at FP8 quantization, you need about 753 GB of VRAM. That’s ten H100 80GB GPUs (800 GB total) or a smaller cluster of H200s. At BF16, the requirement doubles to ~1,500 GB.

Self-hosting makes sense for two audiences: enterprises that can’t send code to external APIs for compliance reasons, and AI labs that want to fine-tune the weights for specialized domains. For everyone else, OpenRouter is simpler.

# Example vLLM serving command (8x H100)
python -m vllm.entrypoints.openai.api_server \
    --model zai-org/GLM-5.2 \
    --tensor-parallel-size 8 \
    --dtype float8 \
    --max-model-len 1048576

The China Data Question

Z.ai is headquartered in Beijing. The US Bureau of Industry and Security added Zhipu AI (Z.ai’s former name) to its Entity List in January 2025, citing the company’s role in advancing military AI modernization. In late April 2026, US House lawmakers opened a formal inquiry into cybersecurity risks posed by Chinese AI models in critical infrastructure, naming Z.ai alongside DeepSeek, MiniMax, ByteDance, and several others.

The risk depends entirely on the serving path.

If you use Z.ai’s hosted API, your prompts and code route through servers subject to China’s National Intelligence Law, which requires Chinese companies to cooperate with government data requests. For regulated industries (healthcare, finance, defense, government contracting), this is a non-starter. For any codebase containing proprietary algorithms, API keys, customer data, or anything you wouldn’t post publicly, routing through Z.ai’s API is a risk most security teams won’t approve.

If you self-host or use a Western OpenRouter provider, the data never touches Z.ai’s servers. The MIT license has no phone-home requirements, no telemetry obligations, no usage restrictions. You download the weights, serve them on your own infrastructure, and Z.ai has no visibility into what you’re doing. This is the scenario where the open-weights value proposition fully pays off.

If you use OpenRouter, check which provider is serving your request. Most OpenRouter providers for GLM-5.2 route through non-Chinese infrastructure, but the routing can vary. Verify with your specific provider if this matters for your compliance requirements.

The practical upshot: GLM-5.2’s China connection is a non-issue if you self-host, a manageable concern on most OpenRouter providers, and a hard blocker if you’d be sending sensitive code directly to Z.ai. The MIT license exists precisely to decouple the model’s capabilities from the company’s jurisdiction.

Who Should Use GLM-5.2

Good fit:

Independent developers and small teams who want near-frontier coding quality at $1.40/M input instead of $5.00/M. The cost difference compounds fast when you’re running agent loops that process dozens of files.
Teams already invested in open-weight infrastructure (vLLM clusters, self-hosted inference). GLM-5.2 slots into that stack with no vendor lock-in and no API dependency.
Long-context use cases where the 1M window matters: codebase-wide refactors, multi-file agents, repository Q&A.
Anyone who needs to fine-tune a frontier-class coding model for a specific domain. MIT license means no restrictions.

Bad fit:

Enterprise teams in regulated industries that can’t risk any data routing through Chinese infrastructure, unless they have the GPU budget to self-host.
Latency-sensitive production pipelines. At ~75 seconds per coding task through OpenRouter, GLM-5.2 is sluggish compared to Opus 4.8 or GPT-5.5’s APIs.
Tasks requiring low hallucination rates. The 28.1% factual error rate is acceptable for code (where tests catch mistakes) but rough for content generation, research synthesis, or customer-facing text.
Teams that need multimodal capabilities. GLM-5.2 is text-only. Z.ai’s vision model (GLM-5V-Turbo) exists separately and isn’t open-weights.

FAQ

Is GLM-5.2 better than Claude Opus 4.8 for coding?

Not quite. On SWE-bench Pro, Opus 4.8 scores 69.2 vs GLM-5.2’s 62.1, a 7-point gap. On FrontierSWE, it narrows to 75.1% vs 74.4%. Opus 4.8 is faster, less verbose, and has lower hallucination rates. But it costs 3-5x more per token. Whether “better” means “higher score” or “better value” depends on your budget.

Can I run GLM-5.2 locally?

Technically yes, if you have ~753 GB of VRAM (ten H100 80GB GPUs at FP8). For most developers, local running isn’t practical. Cloud hosting through a managed provider or OpenRouter is the realistic path.

Is GLM-5.2 safe to use with proprietary code?

It depends on the serving path. Self-hosted or through a trusted Western provider on OpenRouter — yes, your data doesn’t leave your infrastructure. Through Z.ai’s own API — your prompts traverse Chinese servers governed by the National Intelligence Law. Most enterprise security teams will block the latter.

How does GLM-5.2 compare to DeepSeek V4 Pro?

DeepSeek V4 Pro is ~10x cheaper ($0.44/$0.87 per million tokens) as we detailed in our DeepSeek V4 Pro review, but scores lower on coding benchmarks — 55.4 on SWE-bench Pro vs GLM-5.2’s 62.1. DeepSeek wins on cost, GLM-5.2 wins on capability. Both carry similar China-data concerns if used through their respective hosted APIs.

Why is GLM-5.2 so verbose?

The model produces about 43K output tokens per coding task, with 37K being internal reasoning tokens that get billed but don’t appear in the response. Deeper reasoning chains improve accuracy on complex tasks, but they inflate costs and latency. Z.ai hasn’t offered a “low verbosity” mode that trades some accuracy for speed.

Sources

Z.ai GLM-5.2 documentation — official specs, API reference, and capability overview
Simon Willison’s analysis of GLM-5.2 — independent benchmarks and observations from the day of release
Artificial Analysis: GLM-5.2 is the new leading open-weights model — Intelligence Index v4.1 scores, GDPval-AA agentic benchmarks, and pricing data
VentureBeat: Z.ai’s GLM-5.2 beats GPT-5.5 on multiple benchmarks — SWE-bench Pro and FrontierSWE comparisons
OpenRouter GLM-5.2 listing — live pricing and provider availability
GLM-5.2 weights on Hugging Face — MIT-licensed open weights
TechCrunch: OpenAI launches the GPT-5.6 family — the July 9, 2026 release that supersedes GPT-5.5 in this comparison
VentureBeat: Moonshot releases Kimi K3 — the incoming open-weights challenger (weights ~July 27, 2026)

Bottom Line

GLM-5.2 is still the strongest open-weights coding model you can actually download today — but that lead is on a clock. Moonshot’s Kimi K3 (2.8T parameters, Artificial Analysis Intelligence Index 57 against GLM-5.2’s 51) is API-only for now, with open weights scheduled around July 27, 2026; once those land, GLM-5.2 likely loses the open-weights top spot. At 62.1 on SWE-bench Pro, GLM-5.2 beat GPT-5.5 (58.6) and closes much of the gap to Claude Opus 4.8’s 69.2, while costing a fraction of either; OpenAI’s newer GPT-5.6 Sol has since moved ahead of it, so the win now reads as “top open-weights model,” not “beats every proprietary flagship.” The MIT license still means you can self-host it, fine-tune it, and deploy it in air-gapped environments — none of which Kimi K3 offers until its weights drop.

The trade-offs are real. It’s slow (75 seconds per task on OpenRouter), verbose (burning 43K tokens when 20K might suffice), and the 28.1% hallucination rate means you can’t trust it for factual content without verification. The China data question adds a layer: if you can’t self-host and your compliance posture rules out Chinese API endpoints, you’re limited to OpenRouter’s Western providers.

For non-sensitive coding work where I’m paying out of pocket, GLM-5.2 through OpenRouter at $1.40/M input is the best value in frontier AI right now. It wrote clean Python and Go on the first pass, handled 400K tokens of context without degradation, and saved me about 80% compared to my usual Opus 4.8 API costs. Open-weights coding models have crossed from curiosity to credible daily driver.

TL;DR#

What GLM-5.2 Actually Is#

Benchmark Breakdown#

Pricing: The Real Reason to Pay Attention#

The 1-Million-Token Context Window#

How to Use GLM-5.2#

OpenRouter (quickest start)#

Z.ai’s API (official, but read the data section below)#

Self-Hosting (full control, big hardware)#

The China Data Question#

Who Should Use GLM-5.2#

FAQ#

Is GLM-5.2 better than Claude Opus 4.8 for coding?#

Can I run GLM-5.2 locally?#

Is GLM-5.2 safe to use with proprietary code?#

How does GLM-5.2 compare to DeepSeek V4 Pro?#

Why is GLM-5.2 so verbose?#

Sources#

Bottom Line#

Don't miss what's next

Related Articles

Claude Sonnet 5 Review: A Week With Anthropic's New Default

GPT-5.5 Review After Seven Weeks: Where It Beats Claude and Where It Doesn't

Claude Fable 5 Review: 80% SWE-Bench Pro, but Read the Fine Print

Kimi K3 Review: The 2.8T Open Model That Beats Claude on Paper