MiniMax M2.5: The $1 Per Hour AI Worker Reshaping Enterprise Economics

Asia Daily
10 Min Read

The $1 Per Hour AI Worker

Chinese artificial intelligence startup MiniMax has unleashed a disruption that threatens to collapse the economics of enterprise AI deployment. The Shanghai-based company released M2.5, a language model that delivers performance rivaling the most expensive Western systems while costing roughly one-tenth to one-twentieth the price of competitors like Claude Opus 4.6 and GPT-5.2.

The release, which hit Hugging Face under a modified MIT License, arrives as part of a frenetic week of Chinese AI announcements that rattled global markets and redrawn competitive boundaries. MiniMax shares, traded as MINIMAX-WP (00100.HK) on the Hong Kong Stock Exchange, surged between 11% and 16% following the announcement, closing at HK$680 according to the South China Morning Post.

MiniMax has positioned M2.5 not merely as a chatbot but as a production-grade digital employee capable of autonomous coding, research, and document creation. The company claims this represents the first frontier model where cost concerns effectively vanish, offering continuous operation at 100 tokens per second for just $1 per hour. At slower speeds, that cost drops to $0.30 hourly. This pricing structure enables running four AI agents continuously for an entire year for approximately $10,000, a figure that would barely cover a month of operation using comparable Western models.

Advertisement

Benchmarks That Challenge the Frontier

Independent evaluation and MiniMax’s own testing reveal M2.5 sitting at the top tier of coding models, within striking distance of Anthropic’s Claude Opus 4.6 despite the massive price differential. On SWE-Bench Verified, the industry standard for measuring real-world software engineering capabilities, M2.5 scored 80.2%, just 0.6 percentage points behind Opus 4.6 and ahead of OpenAI’s GPT-5.2 at 80% and Google’s Gemini 3 Pro at 78%.

The model demonstrates particular strength in multilingual coding scenarios, achieving 51.3% on Multi-SWE-Bench, which tests cross-repository programming tasks requiring understanding of multiple codebases simultaneously. This places it ahead of Opus 4.6 at 50.3% and substantially beyond Gemini 3 Pro at 42.7%. On VIBE-Pro, an internal benchmark testing complex full-stack development tasks, M2.5 performs on par with Opus 4.5.

Beyond raw coding, M2.5 shows exceptional agentic capabilities. It scored 76.3% on BrowseComp, which measures complex web browsing and information retrieval, and 76.8% on the Berkeley Function Calling Leaderboard (BFCL) for multi-turn tool use, outperforming Opus 4.6 by over 13 percentage points. Third-party analyst firm Artificial Analysis assigned M2.5 a score of 42 on its Intelligence Index, well above the average of 25 for comparable models, while noting the model operates at 80 tokens per second, faster than the average of 55.

However, independent verification on the SWE-rebench January 2026 leaderboard, which tests 48 fresh GitHub pull requests using different methodology, showed M2.5 achieving 39.6% compared to Claude Code at 52.9%, suggesting real-world performance may vary based on evaluation harness and task recency.

Advertisement

The Architecture of Efficiency

M2.5’s economic viability stems from its Mixture of Experts (MoE) architecture, a design pattern that mimics specialist consultation rather than generalist labor. While the model contains 230 billion total parameters, it activates only 10 billion for any given task. This selective engagement allows the system to maintain the reasoning depth of massive models while operating with the speed and cost structure of much smaller systems.

To train this complex architecture, MiniMax developed Forge, a proprietary reinforcement learning framework built specifically for agentic AI. Unlike standard approaches that optimize for pleasing responses, Forge deploys models into hundreds of thousands of live simulated environments including code repositories, web browsers, and office applications, rewarding actual task completion rather than stylistic alignment. The system achieves approximately 40 times faster training speed than standard RL approaches through optimized asynchronous scheduling and tree-structured merging strategies for training samples.

MiniMax engineer Olive Song explained during an appearance on the ThursdAI podcast that this training regimen required two months of intensive work across diverse real-world scenarios. To maintain stability during this process, the team implemented CISPO (Clipping Importance Sampling Policy Optimization), a mathematical approach that prevents the model from over-correcting during training iterations.

“What we realized is that there’s a lot of potential with a small model like this if we train reinforcement learning on it with a large amount of environments and agents. But it’s not a very easy thing to do,” Song said, adding that the team spent substantial time refining the process.

This methodology has cultivated what MiniMax terms an “Architect Mindset.” Rather than immediately generating code, M2.5 first plans project structure, features, and interface design, decomposing complex tasks into manageable components before execution begins. The model was trained on over 10 programming languages including Go, C, C++, TypeScript, Rust, Kotlin, Python, Java, JavaScript, PHP, Lua, Dart, and Ruby across more than 200,000 environments.

Advertisement

From Chatbot to Digital Employee

The transition from AI as conversational interface to AI as autonomous worker represents the central thesis of M2.5’s design philosophy. MiniMax collaborated with senior professionals in finance, law, and social sciences to imbue the model with industry-specific tacit knowledge, enabling it to generate deliverable work products rather than raw text blocks.

This focus manifests in practical office automation capabilities that most frontier models ignore. M2.5 can create and manipulate Microsoft Word documents, build PowerPoint presentations with proper layouts and charts, and execute complex Excel financial modeling tasks. On the MEWC benchmark, which draws from actual Microsoft Excel World Championship competition problems, the model scored 74.4%. On the internal GDPval-MM evaluation framework, which assesses both deliverable quality and professional trajectory through pairwise comparisons, M2.5 achieved a 59% average win rate against mainstream competitors.

MiniMax has already deployed M2.5 internally as a test case for enterprise adoption. According to company statements, the model now autonomously completes 30% of all tasks across departments including research and development, product management, sales, human resources, and finance. In coding environments, M2.5 generates 80% of newly committed code at MiniMax headquarters.

The company has packaged these capabilities into MiniMax Agent, a platform where users have built over 10,000 specialized Experts combining Office Skills with domain-specific industry knowledge. These configurations allow the Agent to follow established standard operating procedures for industry research or adhere to specific risk control logic for financial modeling, rather than merely generating raw text blocks.

The model handles full-stack development across Web, Android, iOS, and Windows platforms, managing server-side APIs, business logic, and databases rather than merely producing frontend demonstrations. This capability extends through the entire software lifecycle, from initial system design through feature iteration to comprehensive code review and testing.

Advertisement

The Economics of Continuous Operation

MiniMax offers two API variants identical in capability but differentiated by throughput. M2.5-Lightning operates at 100 tokens per second, pricing at $0.30 per million input tokens and $2.40 per million output tokens. The standard M2.5 runs at half that speed (50 tokens per second) for half the cost: $0.15 per million input tokens and $1.20 per million output tokens.

To contextualize these figures, running four M2.5 agent instances continuously for an entire year costs approximately $10,000. By comparison, Claude Opus 4.6 charges roughly $75 per million output tokens, making comparable sustained operation economically prohibitive for most organizations. Based on output pricing alone, M2.5 costs one-tenth to one-twentieth that of Opus, Gemini 3 Pro, and GPT-5.

The cost differential becomes stark when examining specific task economics. Alex Volkov, host of the ThursdAI podcast, noted that M2.5 completes SWE-Bench tasks using approximately $0.15 worth of tokens, while Claude Opus 4.6 requires about $3.00 for equivalent work. This 95% cost reduction eliminates the traditional pressure to optimize prompts and minimize token usage, allowing developers to deploy high-context, high-reasoning models for routine operations previously relegated to cheaper, less capable systems.

Both API variants support caching, further reducing costs for repetitive operations. The modified MIT License requires commercial users to display “MiniMax M2.5” prominently in their user interfaces, a minor condition given the economic advantages. At 100 tokens per second, continuous generation generates 360,000 output tokens per hour, costing roughly $0.86 in output tokens plus input costs.

The speed improvements compound the cost benefits. M2.5 completes end-to-end tasks 37% faster than its predecessor M2.1, with SWE-Bench Verified runtime dropping from 31.3 minutes to 22.8 minutes, matching Claude Opus 4.6’s 22.9 minutes while using only 10% of the cost per task.

Advertisement

Market Context and Competitive Pressure

M2.5’s release punctuates a volatile week for Chinese artificial intelligence, during which multiple domestic companies unveiled competitive models. Zhipu AI launched GLM-5, which surpassed Google’s Gemini 3 Pro on the Artificial Analysis Intelligence Index, while Alibaba introduced RynnBrain for robotics applications, ByteDance released Seedance 2.0 for video generation, and Kuaishou debuted Kling 3.0. MiniMax and Zhipu AI both saw double-digit stock gains, with the latter rising 30% in Hong Kong trading.

These rapid-fire releases challenge recent assessments of the China-US AI gap. Google DeepMind chief Demis Hassabis had previously estimated Chinese models trailed Western offerings by mere months; MiniMax’s near-parity with Claude Opus 4.6, released just one week prior, suggests that lag has compressed to days. The SWE-rebench January 2026 results show Claude Opus 4.6 at 51.7% and GPT-5.2 at 51.0% on fresh pull requests, with MiniMax M2.5 at 39.6%, indicating Western models retain advantages on the most recent code, though the gap continues narrowing.

The competitive implications extend beyond benchmarks. MiniMax’s pricing strategy applies systematic pressure on Western AI labs that have built business models around premium API rates. When intelligence becomes, as MiniMax claims, “too cheap to meter,” the industry shifts from scarcity economics to volume economics, potentially disrupting the revenue models of established players.

For technical leaders, M2.5 offers immediate operational advantages. The 37% speed improvement over its predecessor enables real-time agentic pipelines where models interact with other models without latency bottlenecks. Organizations can now conduct automated code audits at unprecedented scale while maintaining data privacy through self-hosted open-weight deployment.

Advertisement

The Essentials

  • MiniMax M2.5 achieves 80.2% on SWE-Bench Verified, matching Claude Opus 4.6 performance at one-tenth to one-twentieth the cost
  • Two API variants available: Standard at $0.15/$1.20 per million tokens (50 TPS) and Lightning at $0.30/$2.40 (100 TPS)
  • 230 billion parameter MoE architecture activates only 10 billion parameters per task, reducing computational overhead
  • Trained using proprietary Forge RL framework across 200,000+ real-world environments with CISPO algorithm stability control
  • Capable of autonomous full-stack development, office document creation (Word, Excel, PowerPoint), and complex financial modeling
  • Currently deployed internally at MiniMax where it completes 30% of company tasks and generates 80% of new code
  • Released as open-weight model on Hugging Face under modified MIT License requiring commercial attribution
  • Company shares (MINIMAX-WP, 00100.HK) surged 11% to 16% on Hong Kong Stock Exchange following announcement
Share This Article