GLM 4.6 drives a tenfold jump in Zhipu AI’s overseas paid users as coding tools go global

Asia Daily
13 Min Read

A breakout moment for a Chinese coding model

Zhipu AI, the Beijing startup behind the Z.ai brand, says paid usage of its services outside China has climbed tenfold in the past two months. The company now counts about 100,000 monthly API customers who pay for access and roughly 3 million free chatbot users overseas. That momentum arrived after the late September launch of GLM 4.6, a model aimed at coding, reasoning and agent workflows. The release immediately pushed Zhipu into the daily toolkits of many developers, thanks to an attractive price point, easy integration paths and practical performance on real programming tasks.

Li Zixuan, who heads Zhipu’s global operations, framed the surge as a signal of where the market is moving and where the company plans to invest next. He said overseas demand is now expanding faster than uptake at home, and that the team wants to keep building across regions rather than prioritizing a single geography.

After describing the shift, Li captured the strategy in simple terms.

“We want to grow in every market because growth overseas is exceeding growth domestically,” said Li Zixuan, head of global operations at Zhipu AI.

GLM 4.6 was promoted as a powerful open model for coding. It was quickly integrated into developer tools that many programmers already use. Claude Code, which is backed by Anthropic, added access to GLM 4.6. Other coding agents such as Cline, Roo Code and Kilo Code followed. Kilo Code said the model drove the fastest adoption the platform had seen, with token usage rising 94 times in 12 days because the model was a low cost, “good enough” option compared with leading US systems. That combination of workable ability and value helped Zhipu break through with individual developers and small teams outside China.

What makes GLM 4.6 attractive to developers

GLM 4.6 extends the context window to 200,000 tokens, which means it can read and work with more material at once, such as large codebases or long technical documents. The model also improves coding quality, reasoning and tool use. In the company’s own evaluations across agents, reasoning and code, GLM 4.6 shows clear gains over GLM 4.5. It integrates more smoothly with agent frameworks, handles tool calling during inference, and generates more polished front end pages compared with the prior release.

On internal and public tests, GLM 4.6 is competitive with many widely used models. It reaches near parity with Claude Sonnet 4 in head to head comparisons on the CC Bench agent tasks with a 48.6 percent win rate, while using fewer tokens than GLM 4.5 to finish the same jobs. Zhipu also acknowledges that GLM 4.6 still trails Claude Sonnet 4.5 on some coding benchmarks. That balance mirrors developer feedback, which often centers on whether a system is reliable enough for day to day work even if it is not the single best model on every leaderboard.

Developers can access GLM 4.6 through the Z.ai API, through routing hubs such as OpenRouter, or by running the open weights locally. Zhipu has posted model weights on platforms like Hugging Face and ModelScope and supports common inference stacks including vLLM and SGLang, which makes the model attractive to engineering teams that want control over deployment. The company’s technical blog outlines the changes in more detail and explains recommended settings for coding and tool assisted tasks. For deeper technical notes, Zhipu maintains a model page and documentation for GLM 4.6 on its official channels, including the blog at z.ai.

Price point strategy and the good enough thesis

Cost is a central reason GLM 4.6 is gaining ground. Zhipu markets a coding plan that it says delivers Claude level performance for a fraction of the price. The company promotes roughly one seventh the cost and a larger usage quota for subscribers compared with a comparable tier on a top US service. The exact value depends on usage and region, yet the message is clear. Many developers want a reliable assistant that fits a tight budget, and they prefer to ship code rather than chase small gains on obscure benchmarks.

For freelance programmers, students, and startups, a tool that is “good enough” at a low rate can be the difference between adopting AI as an everyday helper or shelving it after a trial period. GLM 4.6 has shown strength in bread and butter tasks that many coders face daily, such as debugging, writing backend functions, wiring integrations, and styling front end components. Reviews from the developer community have emphasized that the model is cost efficient, and that even with occasional corrections, the speed and price make it a practical choice for routine work.

Zhipu also points out that the model finishes tasks with fewer tokens than its predecessor. That detail matters to users who pay per token, because a lower token bill for the same job reduces the total cost of ownership. The company has paired that efficiency claim with a simple upgrade path. Subscribers to its coding plan are automatically moved to GLM 4.6, and app configuration changes amount to selecting the new model name.

Integration into global tools and a new debate over model origins

One reason GLM 4.6 spread quickly is that it plugs into the tools developers already trust. Claude Code, Cline, Roo Code and Kilo Code all support it. In practice that means a programmer can switch between providers inside the same coding agent, compare results on a problem, and settle on the model that balances speed, quality and cost. That ease of integration helped Zhipu ride the growth of agent style coding assistants that manage multi step tasks with tool use, web search and repository operations.

The momentum has also fed a fresh debate about the origins of some new US coding tools. Two high profile services released in recent weeks, including a new model from Cognition AI named SWE 1.5 and a Composer tool from Cursor, drew scrutiny from users who noticed that both systems appeared to carry traces of Chinese model behavior. One tool generated reasoning logs in Chinese, and another acknowledged it was built on a leading open model without naming it.

Zhipu addressed the topic publicly. In a statement, the company said it believed one of the tools was using its latest model as a foundation.

“We believe SWE 1.5 used our latest GLM 4.6 as the base model,” Zhipu AI said in a statement responding to questions about model origins.

The discussion is less about law and more about norms. Open licenses permit commercial reuse, including reuse without attribution, which is why world class systems with open weights often become the base for many downstream products. Some experts stress that the unique value of a commercial tool can come from the fine tuning, data pipelines and system engineering added on top of a base model.

Florian Brand, a researcher at Trier University who studies open models, summed up that view.

“The fine tuning is the sauce,” said Florian Brand, referring to the custom training and engineering that turn a base model into a polished product.

Overseas focus and the constraints at home

The surge abroad reflects both pull and push. Overseas demand for affordable coding AI is strong, and Zhipu is meeting it with a model that is simple to integrate and cheap to run. At home, the company faces a slower public sector pipeline and increased competition. Management has said that overseas growth now outpaces domestic expansion, which is why Zhipu is investing in markets where developers are already paying for AI tools.

Investors have taken notice. The company is valued at roughly 40 billion yuan and is preparing for a stock listing targeted for 2026. Zhipu was founded in 2019 as a spin out from Tsinghua University and has become one of China’s most visible AI firms. While it is not yet pursuing a mass market consumer subscription business to rival US platforms, it is leaning into enterprise deals and a developer focused coding subscription to build more predictable revenue.

Geopolitics still shape buyer decisions. Zhipu says customers in third countries are comfortable using its services when data can be hosted locally or when the company commits to zero data retention. That mix of local hosting and strict retention policies is becoming a standard way for AI providers to address privacy concerns across borders.

GLM 4.6 performance beyond leaderboards

Benchmarks help frame capability, yet what developers care about is whether a model can complete real tasks. Zhipu extended its CC Bench evaluations to include harder multi step work across front end development, tool building, data analysis, testing and algorithm problems. In those trials, human evaluators worked with models in isolated environments to score how well the systems handled realistic, multi turn workflows. GLM 4.6 improved over GLM 4.5 and reached near parity with Claude Sonnet 4 on those agent tasks. It also completed jobs using about 15 percent fewer tokens than GLM 4.5, which points to better efficiency in practice.

Developers report that GLM 4.6 is better at multi file repositories and can plan longer sequences when building features. The model makes fewer context mistakes on large projects because it can read more files at once. It also produces cleaner front end components and can generate full pages that require fewer edits, although complex styling and edge cases still benefit from human review. Those are the kinds of gains that matter in the day to day rhythm of software teams.

What a context window and tokens mean

A token is a small unit of text that a model uses to read and write. Think of it as a word fragment. API pricing often depends on how many tokens a model consumes. That is why efficiency matters. If a model can complete a task with fewer tokens, the bill goes down and the interaction often feels faster.

The context window is how much text a model can consider at once. A larger context window lets the model read more files and instructions before it starts working. With 200,000 tokens of context, GLM 4.6 can absorb long function chains, configuration files and test suites in one go, which is essential for realistic programming work. It will not remove the need for careful prompts and tool strategy, yet it reduces the back and forth that slows developers on big projects.

Risks, regulation, and access

Chinese AI firms operate under export controls, supply chain limits and regulatory scrutiny in multiple jurisdictions. That environment affects training and deployment plans. Open weight models such as GLM 4.6 can help by giving customers more control over where and how the system runs, including on premises or in a preferred cloud region. Even with those options, buyers still evaluate long term support, service level commitments, and compliance features before choosing a provider.

Data handling is one of the biggest concerns. Zhipu’s pitch to overseas customers includes local data hosting and zero retention settings. The idea is to keep sensitive code and documents inside a customer’s chosen region while still benefiting from a high quality coding assistant. For many teams, that setup is a reasonable compromise between performance, privacy and cost.

A crowded market of coding assistants

The market for coding assistants is now crowded with mature services and fresh entrants. GitHub Copilot popularized the category. Models from OpenAI and Anthropic power premium tools with strong reasoning and context management. Chinese players have advanced quickly with systems such as DeepSeek and Qwen, while startups in many regions are building agent style IDE companions that orchestrate search, analysis, code edits and tests in one loop.

In that landscape, Zhipu is leaning on three differentiators. First, price. The company is positioning GLM 4.6 as a practical tool that performs well at a lower cost. Second, openness. Access to model weights and support for common inference frameworks make it attractive to teams that want to customize or self host. Third, integration. By fitting smoothly into popular coding agents and providing straightforward API options, Zhipu reduces the effort needed to try and adopt its model. None of those factors guarantee dominance. They do build a credible path to steady growth with developers who value reliability, control and a fair bill at the end of the month.

Leadership view on what comes next

Zhipu’s chief executive, Zhang Peng, has spoken publicly about the pace of progress in advanced AI. He argues that discussions about artificial superintelligence tend to blur important details, and that any timeline should be seen as a moving target. What matters for customers is whether models can outperform humans on specific jobs that they care about, and whether those gains translate into real productivity.

At a recent launch event, Zhang explained how he views the next few years.

“Achieving or exceeding human intelligence by 2030 might mean surpassing humans in certain areas, but still falling short in many others,” said Zhang Peng, CEO of Zhipu AI.

That pragmatic stance maps to the company’s product choices. Zhipu is not racing to win every consumer subscription. It is focused on enterprises and on a developer oriented coding plan, while maintaining an accessible free chatbot tier that helps onboard new users. The near term goal is not to define the endpoint of intelligence. It is to deliver a model that helps people ship features, fix bugs and move faster while keeping costs under control.

What to Know

  • Zhipu AI says overseas paid users climbed tenfold in two months to about 100,000 monthly API customers, with 3 million free chatbot users outside China.
  • The jump followed the launch of GLM 4.6, a coding and reasoning model that integrated quickly into tools like Claude Code, Cline, Roo Code and Kilo Code.
  • Kilo Code reported the fastest adoption it has seen, with token usage for GLM 4.6 rising 94 times in 12 days because of its lower cost.
  • GLM 4.6 expands the context window to 200,000 tokens, improves tool use and agent performance, and finishes tasks with about 15 percent fewer tokens than GLM 4.5.
  • The model reaches near parity with Claude Sonnet 4 on CC Bench agent tasks, yet still trails Claude Sonnet 4.5 on some coding tests.
  • Open weights are available for local deployment, and the model supports inference stacks such as vLLM and SGLang.
  • Zhipu is emphasizing a low price strategy, promoting a coding plan that it says offers Claude level performance at a fraction of the cost.
  • The company is valued at roughly 40 billion yuan and is preparing for a possible stock listing in 2026 while focusing on enterprise and developer growth overseas.
Share This Article