MiniMax M2 AI Model Surpasses Google DeepMind

Asia Daily
11 Min Read

A new leader among open weight models

MiniMax, a fast rising Chinese AI startup backed by major domestic investors, has released its M2 large language model and vaulted to the top tier of global AI rankings for open weight systems. Independent evaluations place MiniMax M2 as the most capable open weight model on the Artificial Analysis Intelligence Index, with performance that surpasses Google DeepMind’s Gemini 2.5 Pro and approaches leading proprietary systems from the United States. The company built M2 for two hot use cases, code generation and agent style tool use, then priced and engineered it for speed to entice developers and enterprises that want cutting cost without giving up capability.

The technical through line is efficiency. MiniMax M2 uses a Mixture of Experts design that routes each input to a subset of the model’s neurons, keeping only about 10 billion parameters active at a time while the total model capacity is much larger. That choice lowers latency and GPU memory demands without hollowing out reasoning ability. MiniMax’s API is aggressively priced at 0.30 dollars per million input tokens and 1.20 dollars per million output tokens, roughly 8 percent of Anthropic’s Claude 3.5 Sonnet by the company’s own comparison, and the team advertises inference throughput near 100 tokens per second. The full model weights are available under a permissive license and can be downloaded for self hosting, which means teams can fine tune and deploy M2 internally as they wish.

China’s open weight scene has heated up all year, with DeepSeek, Alibaba’s Qwen, Moonshot AI, and others pushing long context and reasoning. MiniMax’s turn at the front of the pack matters because M2 lands where businesses are spending real money, automating support, research, and development through tool using agents that can browse the web, query internal data, run code, and self check results. That mix, capability plus low cost and transparent licensing, is what has made M2 the new reference point for open weight AI.

How strong is the performance

Artificial Analysis aggregates demanding benchmarks across math, science, instruction following, coding, and agent workflows to create a single Intelligence Index. MiniMax M2 scores 61 on the latest version, ranking as the strongest open weight model and among the top models overall. It clears Gemini 2.5 Pro and sits near Anthropic’s Claude Sonnet 4.5 on several agent and coding tasks, while still trailing the newest frontier class systems like GPT 5 and xAI’s Grok. For many developers, that is a sweet spot, since they gain near frontier capability without frontier level cost or lock in.

Agent tasks at near frontier level

Agent style evaluations measure whether a model can plan, call tools, read results, adapt, and verify end results over multi step tasks. On this front M2 performs especially well. MiniMax reports strong scores on agent oriented suites, including tau squared Bench at 77.2, BrowseComp at 44.0, and FinSearchComp global at 65.5. These cover browsing and citation, financial information retrieval, and planning plus recovery when steps fail. In practice that means an M2 powered agent can design a plan, run shell commands, search and read sources, call an interpreter to test code, correct errors, and present traceable evidence without constant human nudging.

Coding and reasoning benchmarks

MiniMax engineered M2 for end to end developer workflows, where the model edits multiple files, compiles, tests, and repairs code iteratively. On SWE Bench style tasks and Terminal Bench, the model shows competence in real repositories and command line environments across languages. Artificial Analysis’ combined view, which also factors in knowledge and reasoning tests like GPQA Diamond and AIME, indicates that M2’s coding strength does not come at the expense of general intelligence. For organizations that want an assistant capable of both software chores and research work, consistency across these domains is the draw.

Architecture at a glance

M2 is a sparse Mixture of Experts model. Instead of sending every token through every part of a giant network, a learned gate selects only a few expert sub networks to activate per input. MiniMax’s documentation describes a model with roughly 230 billion total parameters and 10 billion activated per inference. Early reports cited 200 billion total parameters, but the company’s technical pages and model card explain the larger total capacity with the same 10 billion active budget. The benefit is clear. You get the breadth of knowledge and specialization of a very large model, while paying inference costs more in line with a 10 billion parameter dense system.

That efficiency shows up in hardware requirements. MiniMax and third party testers describe practical serving on as few as four NVIDIA H100 GPUs at FP8 precision, which puts M2 within reach of midsize engineering teams. By comparison, some rivals keep far more parameters active. DeepSeek’s recent V3.2 activates about 37 billion parameters per token and Moonshot AI’s Kimi K2 uses around 32 billion, which increases memory footprints and reduces throughput. M2’s smaller activation footprint also helps with tail latency and concurrency, two operational headaches for interactive tools and agents.

Agentic design choices and tool use

MiniMax built M2 to think, act, and verify in loops, then exposed that capability through a tool calling guide that connects the model to shells, browsers, retrieval systems, and code runners. The company emphasizes visible reasoning traces in an interleaved thinking format. In multi turn sessions the assistant keeps its planning context and evidence chain intact, which helps it recover from incomplete data and failed steps, and helps developers audit decisions. The approach is designed to make long horizon tasks more reliable and debuggable.

MiniMax also ties M2 tightly to practical developer environments. The model runs well with popular inference frameworks SGLang and vLLM and follows the familiar request formats used by OpenAI and Anthropic. That lets teams swap in M2 with minimal code changes, then scale serving as needed. For anyone building an internal copilot or a production grade agent, lower friction integration is valuable.

MiniMax framed the launch in terms of mission and access. The company said it wants advanced intelligence to reach broad user groups, not only those who can afford premium proprietary models.

“From the beginning, MiniMax has pursued the vision of ‘Intelligence with Everyone.'”

In its public model card, the team underscored its focus on agents and efficiency.

“MiniMax-M2 redefines efficiency for agents.”

MiniMax operates an M2 powered Agent service with two modes. Lightning Mode streams instant responses for simple queries. Pro Mode handles complex jobs that run longer, such as research, data extraction, or multi stage coding. That split mirrors how teams actually work, alternating between quick answers and sustained projects where the model must plan, execute, and check work products end to end.

Access, licensing, and pricing

M2’s availability is unusually open for a system with this level of ability. MiniMax has released the full weights under a permissive license and published deployment guides. Developers can download the repository on platforms like Hugging Face, run local inference with SGLang or vLLM, and fine tune for specific domains. MiniMax also offers an API with very low per token pricing and fast streaming throughput, and the company has promoted limited time free access for both its Agent app and API.

Practical details that matter to buyers include support for structured function calling, long context processing, and robust tool use. MiniMax recommends common sampling settings and provides a clear tool calling schema for linking the model to external services. For teams concerned about vendor lock in, the combination of open weights and clean API compatibility is a strong incentive to evaluate M2. For teams that want to stay entirely on premise, the company’s documentation explains how to serve M2 efficiently and how to size GPU memory for different precision settings.

  • Weights are published for self hosting, with guides for SGLang and vLLM.
  • API pricing is 0.30 dollars per million input tokens and 1.20 dollars per million output tokens, with advertised speeds near 100 tokens per second.
  • Tool use covers shells, browsers, retrieval, and code interpreters, with visible reasoning traces to support audit and debugging.
  • Compatibility with OpenAI and Anthropic style APIs reduces integration work for existing apps.

Those choices make M2 attractive for software engineering copilots, data analysis assistants, and customer support agents that must combine reasoning with reliable tool execution.

Rivals and the shifting leaderboard

MiniMax M2’s rise resets the open weight leaderboard and intensifies competition across both China and the United States. Artificial Analysis places M2 as the strongest open weight model with an Intelligence Index of 61, ahead of Google’s Gemini 2.5 Pro and just behind proprietary leaders like Claude Sonnet 4.5, GPT 5, and Grok 4 depending on the task. That positioning is especially striking for agent evaluations such as BrowseComp and FinSearchComp global, where M2’s ability to plan, cite evidence, and recover from errors has closed much of the gap with closed systems.

The competitive map inside China is changing as well. While DeepSeek and Qwen have dominated headlines for months with frontier scale training and long context, MiniMax is now drawing attention with a smaller activation budget that delivers comparable agent and coding performance at lower cost. DeepSeek has explored massive total parameter counts, which can push memory demands near 700 gigabytes in some setups. MiniMax’s sparse activation and careful serving recipes keep requirements down to a cluster that many mid market teams can afford, especially if they use FP8 and optimized inference frameworks.

Price also matters. MiniMax’s token pricing undercuts large proprietary rivals by an order of magnitude. For enterprises that want to run hundreds of agents or keep assistive coding tools open all day across many developers, this can move budgets from pilot to production. The open weights release is another lever, since it allows security sensitive firms to audit, fine tune, and deploy in their own environments. The combination of access, cost, and capability is a direct challenge to proprietary platforms that rely on premium pricing.

Caveats for buyers

Benchmarks are helpful guides, not guarantees. Artificial Analysis evaluates models with controlled harnesses and fixed prompts, which may not perfectly match a company’s domain or tooling. Teams adopting M2 should run their own evals on representative tasks and measure latency, throughput, and agent reliability with the actual tools they plan to use.

There are also operational details to plan for. MiniMax encourages preserving the model’s interleaved thinking traces across turns so that long tasks maintain coherent plans and evidence. That design can produce verbose outputs and larger conversation histories, which means developers should manage context windows carefully and trim or summarize where possible. Organizations that push maximum throughput should also test quantization settings and batching strategies to balance speed, cost, and accuracy.

Finally, M2 remains an open weight system that organizations can modify, which increases responsibility for safety and compliance. Teams will want to layer monitoring, guardrails for tool calls, and access controls around any deployment that can reach external resources or sensitive data. These are standard concerns for agent systems, and the same discipline applies here.

At a Glance

  • MiniMax M2 is the top ranked open weight model on the Artificial Analysis Intelligence Index and sits among the global leaders overall.
  • It surpasses Google’s Gemini 2.5 Pro in aggregate intelligence while trailing frontier proprietary systems like GPT 5 and Claude Sonnet 4.5 on some tasks.
  • M2 is a Mixture of Experts model with about 230 billion total parameters and 10 billion active per inference for lower latency and cost.
  • Agent performance is a highlight, with strong results on tau squared Bench, BrowseComp, and FinSearchComp global.
  • Coding strength is validated on SWE Bench style tasks and Terminal Bench, supporting end to end developer workflows.
  • The API costs 0.30 dollars per million input tokens and 1.20 dollars per million output tokens, with speeds near 100 tokens per second.
  • Full weights are open for self hosting under a permissive license, and the model integrates with SGLang, vLLM, and common API formats.
  • Serving can be done on as few as four NVIDIA H100 GPUs at FP8 precision, easing deployment for midsize teams.
  • Compared with rivals, M2 activates fewer parameters than DeepSeek V3.2 and Moonshot Kimi K2, improving efficiency.
  • Enterprises should validate performance on their own toolchains and plan for verbosity, context management, and safety controls.
Share This Article