Chinese AI startup Z.ai has released GLM-5.2, an open-weights large language model with 753 billion parameters, and it is making its strongest pitch where developers actually feel pain: coding, long-horizon planning, and cost. The company says the model has matched or beaten leading closed systems in standard benchmarks, including OpenAI GPT-5.5 and Anthropic Claude Opus 4.8, which is exactly the sort of claim that makes the closed-model crowd check their coffee.
That matters because open models are no longer just cheaper alternatives for hobbyists. They are creeping into the same territory as premium proprietary tools, while giving companies something the big vendors rarely do: the option to run locally, tune the model, and keep more control over data and spend.
GLM-5.2 specs and access
GLM-5.2 is available through Z.ai’s API and on Hugging Face, with support for more than 20 third-party development environments. Its context window stretches to 1 million tokens, which puts it in the class of models built for sprawling codebases rather than polite chat.
- Parameters: 753 billion
- Context window: 1 million tokens
- License: MIT for the main weights
- API pricing: $1.40 per 1 million input tokens and $4.40 per 1 million output tokens
The MIT license is the bigger deal than the marketing gloss. Enterprises can download the weights for free, adapt them, and run them on their own hardware or virtual machines, paying only for compute and electricity. That keeps GLM-5.2 squarely in the open-model strategy that Meta, Mistral, and others have been pushing against the dominant closed-shop model from OpenAI and Anthropic.
Why GLM-5.2 is cheaper to run
Z.ai says GLM-5.2 uses an optimization called IndexShare, which reuses one indexer across four layers of sparse attention. At the model’s maximum 1 million-token context, that reduces compute load by 2.9 times. The updated multi-token prediction scheme also skips 20% more tokens during speculative decoding, another quiet efficiency win that should matter more to cloud bills than to conference-keynote applause.
The model also offers reasoning modes. ”Maximum” is aimed at harder logic tasks and generates an average of 85,000 tokens per task, while ”high” is designed to balance output quality with efficiency and produces about half as many. That gives developers a choice between brute-force thinking and something a little less extravagant.
GLM Coding Plan targets developer tools
Instead of centering a chatbot, Z.ai is selling GLM-5.2 to coding workflows through the GLM Coding Plan. Supported tools include Claude Code, OpenClaw, Cline, Kilo Code, Crush, and Factory, which is a clear sign the company wants to sit inside developer pipelines rather than compete for casual prompt traffic.
- Lite: $12.60 a month, or $151.20 a year starting from the second year
- Pro: $50.40 a month and five times more compute than Lite
- Max: $112.00 a month and 20 times more resources, plus dedicated capacity during peak hours
The pricing is segmented by ambition, with Lite pitched at smaller repositories and lighter iterations. Pro costs $50.40 a month and offers five times more compute than Lite, while Max costs $112.00 a month and delivers 20 times more resources, plus dedicated capacity during peak hours.
The open-model pressure on proprietary AI
Benchmarks are a favorite sport in AI, but the pattern is getting hard to ignore: open-weight models are closing the gap fast enough to force the leaders to defend not just quality, but value. If GLM-5.2 really lands near or above GPT-5.5 and Claude Opus 4.8 on coding tasks, the next fight is likely to be about distribution, developer loyalty, and whether enterprises prefer a model they can inspect over one they can only rent.
That is where Z.ai’s bet becomes interesting. The company is not merely chasing benchmark bragging rights; it is trying to turn coding assistants, long-context inference, and open licensing into a package that feels practical enough for real teams. The proprietary giants still own the prestige layer, but the open side is increasingly where the efficient, deployable work is happening.

