China’s Zhipu AI Unveils GLM-5: A 744-Billion-Parameter Open-Source Model Trained on Domestic Chips

Asia Daily
12 Min Read

China’s AI Independence Movement Reaches Frontier Scale

Chinese artificial intelligence startup Zhipu AI, operating internationally under the brand Z.ai, has released GLM-5, a 744 billion parameter large language model that stands as the first frontier scale AI system trained entirely on domestic Chinese hardware without reliance on American semiconductor technology. The announcement on February 11 sent shares of the Hong Kong listed company, officially named Knowledge Atlas Technology, surging by nearly 30 percent to close at 405 Hong Kong dollars, representing a fourfold increase from the initial public offering price just one month prior. The company raised approximately 558 million dollars in its January 2026 IPO, becoming the first publicly traded foundation model company globally. Founded in 2019 as a spinoff from Tsinghua University, Zhipu AI has rapidly established itself as a leader in open source AI development.

The model arrives amid an intensifying wave of releases from Chinese technology firms eager to demonstrate capabilities before the Lunar New Year holiday. However, GLM-5 distinguishes itself through a combination of technical scale, open source availability under the permissive MIT license, and complete independence from NVIDIA graphics processing units. Instead, Zhipu AI utilized Huawei Ascend chips and the MindSpore framework to train the system, a development that carries significant implications for global supply chain dynamics in artificial intelligence infrastructure. This achievement validates that frontier AI training is achievable outside the NVIDIA ecosystem, opening doors for hardware diversity and reducing single vendor dependency risks across the industry. The timing underscores Beijing urgency to showcase progress in domestic chip self sufficiency as Washington tightens export curbs on high end semiconductors.

Advertisement

Technical Architecture and Scale

GLM-5 represents a substantial expansion from its predecessor, GLM-4.7, which contained 355 billion parameters. The new model doubles this capacity to 744 billion total parameters while employing a Mixture of Experts architecture that activates only 40 billion parameters per inference task. This approach, utilizing 256 expert modules with eight activated per token, allows the system to achieve high performance while managing computational costs efficiently. The architecture represents a shift toward sparse activation patterns that maximize capability without linear increases in operational expense, a crucial consideration for organizations deploying large models at scale.

The training dataset expanded correspondingly, growing from 23 trillion tokens in the previous generation to 28.5 trillion tokens for GLM-5. A critical technical innovation involves the adoption of DeepSeek Sparse Attention, a mechanism first pioneered by the Hangzhou based startup DeepSeek, which enables the model to process context windows of up to 200,000 tokens while maintaining output capacity of 131,000 tokens, among the highest figures in the industry. This long context capability allows the model to handle massive documents, entire codebases, research paper collections, and video transcripts within a single session, enabling complex reasoning across extensive inputs. The sparse attention mechanism specifically addresses the computational overhead that typically burdens dense attention models when processing lengthy sequences.

Advertisement

Benchmark Performance and Coding Capabilities

According to internal testing published by Zhipu AI, GLM-5 achieves performance levels that place it at the top of open source model rankings and within striking distance of proprietary Western systems. On SWE-bench Verified, a widely respected benchmark measuring real world software engineering capabilities, the model scored 77.8 percent, surpassing Google DeepMind Gemini 3 Pro at 76.2 percent while approaching Anthropic Claude Opus 4.5 at 80.9 percent. The company acknowledges that Claude remains the leading coding model overall, but emphasizes GLM-5 leadership among open systems. On SWE-bench Multilingual, GLM-5 achieved 73.3 percent, demonstrating strong performance across diverse programming languages.

The model demonstrates particular strength in long horizon operational tasks. In Vending Bench 2, which simulates extended business operations requiring sustained autonomous performance, GLM-5 achieved a final account balance of 4,432 dollars, ranking first among all open source models tested. On BrowseComp, a benchmark for information retrieval and synthesis, the system scored 75.9 points, again leading the open source category. The model also achieved 50.4 points on Humanity’s Last Exam with tools enabled, and 89.7 on the Tau Squared Bench, indicating strong multi step logical reasoning capabilities. On Terminal-Bench 2.0, the system scored 56.2 percent, rising to 60.7 percent in the verified version.

Internal engineering evaluations on CC-Bench-V2 show GLM-5 achieving a 98 percent frontend build success rate and 74.8 percent end to end correctness, compared to 93 percent and 75.7 percent respectively for Claude Opus 4.5. The model also demonstrated 65.6 percent performance on long horizon large repository tasks. These results suggest the system is particularly capable of handling complex full stack development workflows and multi file projects.

Lukas Petersson, co-founder of the safety focused autonomous AI protocol startup Andon Labs, conducted independent evaluation of GLM-5 traces through the Vending Bench 2 benchmark. He observed that the model achieves goals through aggressive tactical execution, though he cautioned about situational awareness limitations compared to leading Western alternatives.

After hours of reading GLM-5 traces: an incredibly effective model, but far less situationally aware. Achieves goals via aggressive tactics but does not reason about its situation or leverage experience. This is scary. This is how you get a paperclip maximizer.

Zhipu AI also highlighted a record low hallucination rate, scoring negative one on the Artificial Analysis Omniscience Index version 4.0, representing a 35 point improvement over the previous generation. This metric indicates the model enhanced ability to recognize knowledge boundaries and abstain from generating fabricated information when uncertain, a critical capability for production enterprise environments where accuracy is paramount.

Advertisement

Training Innovation and Hardware Sovereignty

Beyond raw performance metrics, the development process of GLM-5 marks a watershed moment for technological sovereignty. The model was trained entirely using Huawei Ascend 910 series chips, avoiding any dependence on NVIDIA hardware currently restricted by United States export controls targeting the advanced computing sector of China. This aligns with the broader push of China for semiconductor self sufficiency, targeting substantial independence in data center chips by 2027. For the global AI industry, this signals that hardware diversity in AI training is not just possible but is happening at frontier scale.

To address training inefficiencies at this scale, Zhipu AI developed a novel asynchronous reinforcement learning infrastructure internally codenamed slime. Traditional reinforcement learning approaches often suffer from generation bottlenecks that consume over 90 percent of training time. The slime framework breaks these constraints by allowing trajectories to be generated independently, enabling fine grained iterations necessary for complex agentic behavior through system level optimizations including Active Partial Rollouts. The framework utilizes a tripartite modular system comprising a high performance training module powered by Megatron-LM, a rollout module utilizing SGLang and custom routers for high throughput data generation, and a centralized Data Buffer managing prompt initialization and rollout storage.

The company has additionally optimized GLM-5 for deployment across a broad ecosystem of domestic Chinese semiconductor platforms beyond Huawei, including chips from Moore Threads, Cambricon, Kunlunxin, MetaX, Enflame, and Hygon. Through kernel optimization and model quantization, the system achieves appropriate throughput on these alternative platforms, signaling a maturing domestic AI compute stack that supports the strategic goal of China to build robust locally supported infrastructures. This compatibility reduces reliance on any single hardware vendor and demonstrates the viability of alternative AI accelerators.

Advertisement

Agentic Engineering and Document Generation

Zhipu AI positions GLM-5 as a tool for agentic engineering rather than simple conversational AI, representing an industry evolution from what the company terms vibe coding toward comprehensive systems engineering. The system features a native Agent Mode capable of decomposing high level objectives into actionable subtasks, orchestrating tools, and executing workflows autonomously to produce ready to use results. This delivery first paradigm moves beyond conversation to direct execution, allowing users to specify quality gates while the AI handles implementation details.

A distinctive capability involves the direct generation of professional office documents. The model can transform raw prompts or data into formatted Microsoft Word documents, PDF files, and Excel spreadsheets without requiring manual formatting or intermediate processing steps. Applications include automated generation of financial reports, sponsorship proposals, and complex data analysis visualizations including charts and tables exportable in standard formats. Users can upload data and receive instant analysis with exportable results in xlsx, csv, or png formats.

For software developers, Zhipu AI offers the GLM Coding Plan, a subscription service integrating with over twenty popular development environments including Claude Code, Cursor, Cline, Roo Code, OpenCode, Kilo Code, Crush, and Goose. The plan provides tiered access ranging from a ten dollar monthly Lite option offering three times the usage of Claude Pro, to a thirty dollar Pro tier with 40 to 60 percent faster response times, to an eighty dollar Max tier offering guaranteed peak hour performance and early access to new features. All plans support Vision Understanding, Web Search MCP, and Web Reader MCP capabilities. The company claims this subscription model delivers tens of billions of tokens monthly at approximately one percent of standard API pricing, with speeds exceeding 55 tokens per second for real time interaction.

Advertisement

Commercial Strategy and Infrastructure Constraints

Despite the technical achievements, Zhipu AI faces immediate infrastructure constraints that have forced pricing adjustments. The company raised prices for the GLM Coding Plan by 30 percent effective February 11, citing strong growth in users and usage that has pushed computational resources to their limits. In a public statement, the company acknowledged that compute is very tight, noting that even before the GLM-5 launch, every available chip was operating at maximum capacity to serve existing inference demands. First purchase discounts were removed, though quarterly and annual billing options now provide savings of 10 and 30 percent respectively.

Deploying GLM-5 locally requires approximately 1,490 gigabytes of memory, roughly double the footprint of GLM-4.7. This hardware requirement creates a barrier for smaller organizations seeking to run the model on premise, though the open weights remain available on HuggingFace and ModelScope for those with sufficient resources. For cloud API users, the infrastructure strain creates potential uncertainty regarding rate limiting or degraded response times as the company scales capacity to meet demand.

Pricing for API access positions GLM-5 as a cost disruptive alternative to Western proprietary models. Input tokens cost approximately one dollar per million, with output tokens at three dollars and twenty cents per million. This compares to the pricing of Anthropic Claude Opus 4.6 at five dollars per million input tokens and twenty five dollars per million output tokens, representing a cost advantage of roughly six to ten times depending on usage patterns. GPT-5.2 costs 1.75 dollars per million input tokens and 14 dollars per million output tokens, still significantly higher than GLM-5. Grok 4.1 Fast costs 20 cents per million input tokens, representing a lower cost option, though with different capability profiles.

Advertisement

Competitive Landscape and Industry Timing

The release of GLM-5 coincided with a flurry of announcements from rival Chinese AI laboratories, creating a concentrated demonstration of domestic technological capability ahead of the Lunar New Year holiday. DeepSeek updated its flagship model on the same day, expanding context window capacity tenfold to over one million tokens. MiniMax, which completed its Hong Kong IPO alongside Zhipu in January, launched its M2.5 model with enhanced agentic tools, sending its shares up nearly 50 percent for the week. Additional releases included the Seedance 2.0 video generation model from ByteDance, the Ming-Flash-Omni 2.0 unified multimodal system from Ant Group capable of generating speech and music, and the Kimi 2.5 model from Moonshot AI released in late January.

Chinese Premier Li Qiang emphasized coordinated development of computing resources and power infrastructure to support this acceleration during a policy address on February 11, calling for scaled and commercialized application of AI and better coordination of resources. He also emphasized plans for improving the environment for AI talent and companies. The synchronized timing reflects strategic positioning within the AI sector of China, where companies increasingly coordinate major announcements to maximize impact and demonstrate collective progress against American technological leadership.

For investors, the rally in pure play AI startups contrasts with weakness in larger technology conglomerates like Tencent and Alibaba, which saw shares decline 2.6 and 2.1 percent respectively during the same period. This suggests market preference for specialized artificial intelligence firms over diversified giants. Analysts at Jefferies noted that Chinese models demonstrate strong agentic AI capabilities, positioning them well to capitalize on rapid enterprise adoption. The Hong Kong Hang Seng Tech index dropped 1.7 percent overall, highlighting the selective nature of investor enthusiasm.

Advertisement

The Essentials

  • Zhipu AI released GLM-5, a 744 billion parameter open source language model trained entirely on Huawei Ascend chips without NVIDIA hardware, marking a milestone in the AI infrastructure independence of China.
  • The model achieves 77.8 percent on SWE-bench Verified coding benchmarks, approaching the Claude Opus 4.5 of Anthropic, and ranks first among open source models on Vending Bench 2 and BrowseComp.
  • GLM-5 employs a Mixture of Experts architecture with 256 experts and utilizes DeepSeek Sparse Attention to process 200,000 token context windows while maintaining computational efficiency.
  • The system features record low hallucination rates with a score of negative one on the Artificial Analysis Omniscience Index, representing a 35 point improvement from the previous generation.
  • Zhipu AI raised GLM Coding Plan subscription prices by 30 percent due to infrastructure strain, while the Hong Kong listed shares of the company surged nearly 30 percent following the announcement.
  • The release coincides with major updates from DeepSeek, MiniMax, ByteDance, and Ant Group, intensifying competition in the AI sector of China ahead of the Lunar New Year holiday.
Share This Article