DeepSeek and Qwen surge in live AI crypto trading test while Western models stumble

AI traders face a live market test as DeepSeek takes the lead

A high stakes experiment is putting artificial intelligence to work in live cryptocurrency markets, and the early leaderboard is challenging assumptions about who builds the best trading bots. In Alpha Arena, a real money challenge launched by US research group Nof1, six well known large language models each received 10,000 dollars to trade crypto perpetual contracts on the decentralized exchange Hyperliquid. After a turbulent opening stretch, China based models DeepSeek and Qwen have pulled ahead, while several Western rivals have absorbed heavy losses.

Contents

AI traders face a live market test as DeepSeek takes the lead
How the experiment works
Who is winning right now
What strategies the bots appear to use
Why Chinese models are topping the table
What the early results say about AI in volatile markets
How to track the contest and what the crowd is saying
What comes next
The Essentials

By Monday afternoon in the latest published update, DeepSeek Chat V3.1 had lifted its account to roughly 22,500 dollars, a gain of about 125 percent since trading began on October 18. Qwen 3 Max from Alibaba followed with a gain near 95 percent, having briefly overtaken DeepSeek during the October 24 to 26 window before slipping back to second. At the other end of the spectrum, OpenAI’s GPT 5 and Google DeepMind’s Gemini 2.5 Pro saw steep drawdowns of roughly 60 percent. xAI’s Grok 4 and Anthropic’s Claude 4.5 Sonnet posted modest gains in the low to mid tens.

The contest runs through November 3, with all trades executed autonomously and recorded on chain. A public leaderboard tracks holdings, returns and each model’s own written reasoning for why it entered or exited positions. The idea is simple to state and hard to master: make money in volatile markets while managing risk. After a handful of days, the results already show wide dispersion, sharp reversals, and very human lessons about discipline, leverage and patience.

How the experiment works

Alpha Arena turns language models into trading agents. Each model receives the same system prompt and the same market inputs, then decides when to buy, sell or hold positions in perpetual swaps tied to six major assets. Those assets include bitcoin, ether, solana, dogecoin, BNB and XRP. Perpetuals are a type of futures contract without an expiry date. They stay anchored to spot prices using a funding rate that periodically rewards one side of the market and charges the other. That mechanism can create opportunities, but it also introduces added complexity when markets move fast. The bots can use leverage, must set take profit and stop loss levels, and are evaluated on returns adjusted for risk rather than raw profit alone.

Rules and safeguards

Each model started with the same 10,000 dollars and trades on Hyperliquid under a dedicated wallet address that the public can monitor. To keep the playing field level, organizers feed the same inputs to every participant. The agents do not have access to live news or proprietary data streams, a constraint that reduces headline risk but also limits their ability to react to sudden events. The Alpha Arena team has acknowledged community concerns about potential front running of well known models but has not detailed a specific mitigation beyond transparency and constant scrutiny. Organizers say the goal is not to crown a permanent champion, but to test how well these systems operate under pressure in a dynamic, adversarial setting.

Who is winning right now

Performance has shifted quickly, yet a pattern is emerging. DeepSeek’s gains accelerated after the opening weekend, turning an early single digit profit into triple digit growth by the start of the second week. Qwen kept pace by making large directional bets and at times briefly led the field. Grok and Claude have hovered near breakeven to low positive returns. GPT 5 and Gemini fell behind, with losses that at one point approached or exceeded half their starting capital. Onlookers tracking the wallets also observed dramatic intraday swings. One community update noted that DeepSeek’s equity fell more than 40 percent on a sharp market downdraft before recovering, a reminder that wins have not come without risk.

Market context matters. The period included a rebound in bitcoin and a recovery in ether after weeks of choppy action, conditions that rewarded models willing to hold long exposure. Several agents have shifted bias together, moving from short stances to synchronized long positions as sentiment brightened. That herding is one reason returns have at times moved in lockstep, and it is also why the field remains fragile if a sudden reversal arrives.

What strategies the bots appear to use

Although all six agents see identical inputs, their behavior has not been identical. The public transaction logs and the models’ own notes show clear differences in risk appetite, position sizing and timing. Some concentrate capital in a single coin and swing for the fences. Others spread risk across several assets and use moderate leverage. Execution quality, cash management and discipline around stops have separated the leaders from the laggards.

DeepSeek: diversification and discipline

DeepSeek has favored a diversified book across the six available assets, often using medium leverage and clearly defined exit levels. Early in the contest, the bot logged profits on ether, solana and bitcoin trades while taking a small loss on XRP. A steady cadence of entries, quick losses when rules are violated, and patience on winners shows up in its reasoning feed. Outside observers credit DeepSeek’s edge to discipline, balanced risk allocation and active position management rather than wild bets. The equity curve has not been a straight line, but the upside runs have been strong enough to offset downdrafts.

Qwen: bold concentration

Qwen’s path has featured big, concentrated positions. At one point it went all in on bitcoin, which worked well during a rally. Later it shifted to a large, higher leverage ether long that backfired during a pullback, dropping it from first back to second. That willingness to load up on a single asset explains both Qwen’s surges and its stumbles. When direction is right, the gains are large. When timing is off, the drawdowns mount quickly.

How the others fared

Grok has shown flexible positioning and periods of steady wins, but it has also slipped after promising stretches. Claude stood out by opening one of the largest single positions in the contest to date, signaling a readiness to take risk after waiting for a higher conviction setup. GPT 5 trailed after a cautious opening, struggling to adapt as conditions shifted. Gemini’s rapid trading and frequent flips translated into amplified losses when volatility spiked. External analysis of the contest has flagged the tendency of some Western models to give back early gains and to chase price, a pattern consistent with systems that are tuned to past data rather than live feedback.

Why Chinese models are topping the table

Several factors may explain why DeepSeek and Qwen have outperformed in the early going. DeepSeek was spun out of High Flyer Quant, a hedge fund known for quantitative strategies. Investors and researchers have speculated that DeepSeek’s training included high quality financial signals and reinforcement cycles specifically geared toward market decision making. Cost is another point of interest. According to technical materials cited by industry analysts, DeepSeek’s full training cost was on the order of a few million dollars, far lower than the billions spent to shape general chatbots. Lower cost does not guarantee better results, yet a tightly focused training approach can yield a model that reads market structure and funding dynamics more effectively than a broad conversational system.

There are also practical differences in how prompts and rules were interpreted. Several experienced quants watching the contest have argued that deep domain prompts matter as much as raw model scale. A carefully crafted instruction set that emphasizes position sizing, cash buffers and strict stops can lead to better behavior even if the model is smaller. By contrast, a general chatbot that is world class at writing code or answering complicated questions might still struggle with the rhythm of entries and exits in a live order book.

What the early results say about AI in volatile markets

The scoreboard is impressive at the top and ugly at the bottom. That split is a useful reminder. AI agents can latch onto the right trend and ride it, yet they can also compound mistakes when the feedback loop is noisy. Analysts who reviewed Alpha Arena’s first week have warned that some Western models appear overfitted to historical patterns and lack adaptability when conditions change quickly. The size of the drawdowns points to gaps in risk control, especially around leverage, trade frequency and stop placement.

Jay Azhang, the founder of Nof1 and creator of Alpha Arena, argues that financial markets are the right place to test those limits.

Financial markets are the best training environment for the next era of AI.

One concern is correlation. If many agents are built on the same base models and fed similar prompts, they can behave in similar ways. That raises the chance of crowded trades. When the market turns, losses hit many systems at once. Another issue is the black box nature of model decisions. The contest publishes model generated reasoning for each trade, which helps outsiders understand intent, but that does not guarantee the reasoning is correct. Language models are also prone to hallucination, a polite term for confident nonsense, which in markets can turn into costly errors if not fenced by strict rules.

There are human takes worth weighing too. Some equity analysts have suggested that if language models show durable outperformance with manageable drawdowns, professional investors could rethink the need for ever more complex quantitative stacks. Others counter that the current bots are blind to live news and lack the proprietary data and infrastructure that modern trading firms use. A fair middle ground is forming: even if fully autonomous agents cannot beat experienced human teams over a cycle, the explanations these models produce might give traders fresh ideas and a faster way to test them. Behind the scenes, venue operators and regulators will be watching questions of fairness, disclosure and compliance if automated agents scale up in public markets.

How to track the contest and what the crowd is saying

Alpha Arena is built for public scrutiny. Every bot has an exclusive Hyperliquid wallet address. The leaderboard shows positions, realized and unrealized gains, and the latest rationale in plain language. Later in the first week, prediction markets sprang up around the outcome. On Polymarket, bettors assigned DeepSeek the highest chance of finishing first, with four in ten odds and tens of thousands of dollars in volume at one snapshot in time. The spectacle has drawn comments from prominent crypto figures. One recurring theme from practitioners is that unique strategies matter more than raw firepower, a point often raised by experienced exchange operators and fund managers when they assess algorithmic rivals.

Anyone who wants to follow along can start at the organizer’s site for links to the leaderboard, rules and model pages. That hub aggregates the wallets, positions and commentary in one place so the public can see what these bots decide next. The site is here: nof1.ai.

What comes next

The current season ends on November 3, and rankings can still flip. Organizers say they plan to expand the format beyond crypto into equities and other assets. A consumer platform for agent based investing is on the roadmap, which would let individuals test or deploy agents with guardrails. The wider community is already experimenting with spinoff challenges and open agents that anyone can ping to place a mock trade. This momentum is turning Alpha Arena into a live research lab for how autonomous systems behave in markets.

A caution is warranted. The heavy losses for several models show that a clever chatbot is not a substitute for a complete trading system. Live markets punish late reactions, poor position sizing and sloppy execution. The bright side is that the contest is generating a rare public dataset of agent decisions under stress. That record, win or lose, may be the most valuable output of all, because it gives engineers and traders a common benchmark to improve. With several days left in the season, every position still carries weight, and every stop or take profit will tell observers a little more about what these bots really understand.

The Essentials

Alpha Arena began on October 18 and runs to November 3 with six language models trading crypto perpetuals on Hyperliquid.
Each model started with 10,000 dollars, identical prompts and the same market inputs, and must aim for returns adjusted for risk.
DeepSeek Chat V3.1 led the field by Monday afternoon with about 125 percent gains, while Qwen 3 Max posted roughly 95 percent.
OpenAI’s GPT 5 and Google’s Gemini 2.5 Pro suffered steep losses near 60 percent, with Grok 4 and Claude 4.5 Sonnet showing modest gains.
Public dashboards show every trade, position and the model’s own explanation for why it acted, along with on chain wallet addresses.
Community bettors on Polymarket gave DeepSeek the highest chance of finishing first at one snapshot, with roughly four in ten odds.
Analysts cite overfitting, poor adaptability and weak risk controls as reasons for large drawdowns in some Western models.
DeepSeek’s background in quantitative research and a more specialized focus may be helping it read funding and market structure better.
Organizers plan to broaden the benchmark to stocks and other assets and to release a consumer agent platform with guardrails.
Heavy dispersion of returns underscores that AI agents can capture trends but can also magnify mistakes in fast markets.