Alibaba and Baidu train AI on their own chips as US curbs tighten

Asia Daily
10 Min Read

Why it matters now

China’s two largest consumer internet and cloud companies are taking a decisive step to secure their artificial intelligence supply chains. Alibaba and Baidu have begun training some models on processors they designed themselves, reducing reliance on Nvidia chips that have been the backbone of modern AI. The shift reflects two powerful forces at once, tightening United States export controls on advanced semiconductors bound for China and a domestic push in China to prioritize home made technology in critical infrastructure.

Both companies are moving carefully rather than flipping a switch. Alibaba is using its own accelerators for smaller scale training jobs, while retaining Nvidia hardware for the most demanding work. Baidu is testing its latest Kunlun P800 chip on new versions of its Ernie model, the technology behind its generative AI services. The companies still count on Nvidia for the largest and most complex training runs, where performance, software maturity, and multi chip interconnects are most critical.

The practical effect inside China’s data centers could be profound. Companies that once waited in line for imported GPUs now have a second path to scale up AI computing. Domestic chips that are good enough for a growing share of workloads help mitigate supply risk, keep projects on schedule, and align spending with national technology goals. If these deployments go well, they will validate years of investment in local AI silicon and accelerate a broader move by Chinese cloud providers to blend foreign and domestic hardware in their fleets.

What are Alibaba and Baidu building

Alibaba has a track record in custom chips through its T-Head unit, which previously unveiled the Hanguang 800 AI accelerator for inference and a server CPU based on Arm instruction sets. Its latest AI training processor, referred to internally by employees as Zhenwu, is now being used to train smaller models inside Alibaba Cloud. People who have worked with the chip say its practical performance is competitive with Nvidia’s H20 in tasks that do not require the most advanced interconnects or the largest memory footprints. For frontier model training, Alibaba continues to rely on Nvidia systems while it proves out the new silicon at scale.

Baidu’s Kunlun project is one of China’s longest running home grown AI accelerator lines. Early Kunlun chips focused on inference, the process of running trained models. With Kunlun P800, Baidu is running training experiments for new Ernie versions to validate stability, throughput, and software stack readiness. Baidu’s core advantage is vertical integration, the company builds the Ernie models, runs large cloud clusters, and controls the PaddlePaddle framework, which gives it more freedom to co design hardware and software for better efficiency.

How US export rules reshaped the market

Washington’s controls have steadily tightened since 2022, when performance thresholds for AI accelerators first limited what US vendors could ship to China. Workarounds like special China only GPUs with reduced interconnect bandwidth kept some supply flowing for a time. New measures announced over the following years narrowed those routes. The top Nvidia platforms, such as H100 and newer Blackwell parts, remain out of reach in China. Nvidia created the H20, a less capable option intended to comply with US rules, but industry research indicates shipments of even these parts now require export licenses starting in April 2025, raising friction and uncertainty for Chinese buyers of US chips.

China’s large cloud and internet platforms have answered with a two track plan, acquire what they can from overseas while investing heavily in domestic accelerators. The policy environment also favors local solutions. Government and critical infrastructure customers increasingly prefer hardware designed and supported within China, which encourages cloud providers to prove that Chinese accelerators can handle production workloads at an acceptable cost and with predictable performance.

Nvidia recognizes that the competitive field is changing and that customers have incentives to diversify. A company spokesperson acknowledged that reality while stressing Nvidia’s intent to keep serving developers globally.

“The competition has undeniably arrived. We will continue to work to earn the trust and support of mainstream developers everywhere.”

How close are domestic chips to Nvidia

The answer depends on the job. Reports from engineers who have used Alibaba’s newest accelerator point to parity with Nvidia’s H20 on some training tasks and smaller model sizes. That is meaningful progress for a young platform. For the largest models and most aggressive training schedules, Nvidia still holds an advantage thanks to higher peak compute, faster multi GPU interconnects, mature compiler stacks, and a broad ecosystem of optimized libraries. Chips like H100 and the latest Blackwell designs remain the standard for state of the art model training outside China.

Training and inference are different jobs

AI training is the process of teaching a model by adjusting billions of parameters across many passes through data. It pushes hardware to the limit on compute and memory bandwidth, and it requires very fast links between accelerators so that thousands of chips can synchronize efficiently. Inference is what happens after training. The model is already learned, and the system generates outputs for users in real time. Inference favors low latency, cost efficiency, and power savings. A chip that is excellent for inference, or for training smaller models, might still fall short on multi month training runs for very large models because the inter chip communication fabric and software orchestration become the bottleneck.

Software stack is as important as silicon

Nvidia’s dominance has never been only about chips. Its CUDA platform, libraries like cuDNN and TensorRT, and a deep bench of partner tools make development straightforward and performance predictable. Chinese platforms are building their own software stacks for local accelerators. Baidu’s PaddlePaddle and Alibaba’s cloud toolchains already support their chips, and both companies can tailor kernels and graph compilers to their own models. That control can unlock efficiency gains, but it also means third party developers may face a learning curve when moving code to domestic hardware. If Alibaba and Baidu make these transitions seamless inside their clouds, customers may care more about price, availability, and service level than about the badge on the chip.

Business strategy and market reaction

Investors welcomed signs that China’s internet leaders can secure AI compute without depending entirely on imports. Shares of Alibaba and Baidu rose after reports of the new deployments, reflecting expectations that a more resilient supply base will support product roadmaps and reduce the risk of delays. A home grown chip that is good enough for a large share of training jobs can also trim long term operating costs if it is easier to procure and to scale in domestic data centers.

The strategic logic extends beyond cost and supply. Alibaba Cloud and Baidu Cloud sell to enterprises and public sector clients that often prefer locally designed hardware. Offering compute on domestic accelerators can help win sensitive workloads while staying aligned with regulatory preferences on security and data locality. That said, both companies appear intent on a pragmatic mix, keep Nvidia systems for the hardest problems and lean on domestic chips for the many jobs where performance is comparable and the economics are attractive.

Supply chain and manufacturing constraints

Designing a competitive AI accelerator is only part of the challenge. Manufacturing, packaging, and memory supply all matter. The highest performance chips today are built on advanced process nodes and rely on high bandwidth memory stacks packaged close to the processor. China’s foundries have made progress, but they do not yet match the most advanced overseas nodes. That gap can be partly offset by smart design choices, strong packaging, and efficient software, yet it still affects peak performance and power efficiency.

Availability of high bandwidth memory is another constraint, since training large models consumes enormous memory bandwidth and capacity. Cloud providers manage around these limits by spreading workloads across more chips, by optimizing precision formats, and by scheduling jobs that match each platform’s strengths. In that environment, a domestic accelerator that is well integrated into the cloud software stack and easy to procure can carry a meaningful share of the work even if the theoretical peak falls short of the very best foreign parts.

What it means for developers and customers

Customers of Alibaba and Baidu will care about three things, time to get capacity, reliability, and total cost for a given training run or inference workload. If domestic accelerators shorten waiting lists and deliver consistent performance with familiar frameworks, many teams will move without much friction. Model developers may need to retune kernels or adopt vendor specific tools to unlock the best throughput, but that is common practice in high performance computing. The near term pattern is likely to be blended clusters and scheduler policies that route jobs to the best available hardware, with the largest frontier model training still landing on Nvidia systems and a growing set of mainstream jobs running on Alibaba and Baidu chips.

What to watch next

Two milestones stand out over the coming year. The first is the licensing regime for shipments of US accelerators to China, which industry researchers say will cover parts like Nvidia H20 and AMD MI308 from April 2025. The second is real world evidence from Alibaba and Baidu that their chips can handle production training at scale, not just lab tests. Benchmarks on well known models, customer case studies, and the pace at which cloud instances based on domestic accelerators sell out will tell the story. Keep an eye on how quickly software ecosystems mature around these chips, including support in popular frameworks and third party tools. Watch, too, how other Chinese platforms, including those that build their own accelerators or buy from domestic vendors, allocate capital between imported GPUs and local options as new projects spin up.

Key Points

  • Alibaba and Baidu have begun training AI models on chips they designed, while keeping Nvidia systems for the most demanding work.
  • Alibaba’s latest accelerator is used for smaller training jobs and is said by employees to rival Nvidia H20 on some tasks.
  • Baidu is testing its Kunlun P800 for new Ernie model versions and leveraging its control of the PaddlePaddle framework.
  • US export controls restrict access to top Nvidia parts, and industry research points to new license requirements for H20 shipments to China in 2025.
  • Nvidia says it will continue working to earn developer trust as competition intensifies.
  • Domestic accelerators are good enough for a growing share of workloads, though Nvidia still leads on frontier scale training.
  • Software ecosystems are a key factor, with Alibaba and Baidu tuning toolchains for their own hardware.
  • Investors reacted positively, seeing reduced supply risk and more control over AI roadmaps.
  • Manufacturing and memory supply remain constraints, yet smart design and cloud integration can offset some limits.
  • Expect blended clusters in China’s clouds, with job schedulers routing tasks between domestic chips and imported GPUs.
Share This Article