The birth of China's largest computing chip! Suiyuan Technology releases Yunsi 2.0 AI training chip

Suiyuan Technology released the second generation of artificial intelligence training products in Shanghai on July 7th-"Yunsi 2.0" chip, "Yunsui T20" training accelerator card based on Yunsi 2.0 and "Yunsui T21" training OAM module, The fully upgraded "TopsRider" software platform and the brand-new "Yunsui Cluster", became the first company in China to release the second-generation artificial intelligence training product portfolio.

This is another masterpiece of Suoyuan Technology after the release of the first-generation training chip, Yunsi 1.0 and Yunsui T10/T11 in December 2019, and the first-generation inference product Yunsui i10 in December 2020.

Suoyuan Technology CEO Zhao Lidong (left) and Suoyuan Technology COO Zhang Yalin (right) jointly released the "Yunsui T20" training accelerator card and "Yunsi 2.0" chip

The commercial landing scenarios of Suiyuan products include liquid-cooled super-large-scale clusters to accelerate the production of video content; reinforcement learning enables game AI to change the way of digital entertainment; powerful visual capabilities to protect financial security; edge deployment and integration of multiple data to build A new type of intelligent transportation; efficient reasoning computing power cluster, building an AI video cloud in the park.

With the release of the second-generation products, the commercialization of these scenarios will be accelerated.

Yunsui Intelligent Computing Cluster

With the development of natural language processing, reinforcement learning, unsupervised learning, multimodal and cognitive models, etc., in the application of knowledge graphs, smart brains, game engines, converged media, and general artificial intelligence, the green integrated super-smart Computing clusters is a key trend.

Suiyuan released the CloudBlazer Matrix cloud intelligent computing cluster, which represents the world's highest level, the highest 8192 Enflame CloudBlazer training card and DTU chip, the highest 1.3E (130000T) single-precision intelligent computing power cluster. Liquid cooling/PUE per integration drops below 1.15. Up to 160T single-precision tensor training card and 80% linearity.

This is a clustered product launched for intelligent computing and new infrastructure computing power. CloudBlazer Matrix 2.0 will include Yunsi DTU2.0, CloudBlazer T20/T21 boards, and Topsrider2.0.

Build China's largest computing chip, with a powerful performance of DTU2.0

邃思 DTU2.0 is China's largest computing chip, breaking the limit of packaging. DTU2.0 adopts a 2.5D advanced package and integrates 9 chips, of which 1 DTU chip and 4 Samsung HBM2E constitute advanced packages. The package size is 57.5mm*57.5mm.

Suiyuan Technology's second-generation general artificial intelligence training chip "Xiaosi 2.0"

The computing power of DTU2.0, single-precision FP32 up to 40FLOPS, supports single-precision tensor TF32, up to 160T FLOPS. The TF32 format is considered the most advanced data model in the data center.

In addition, DTU2.0 implants a fully programmable data stream, software instruction-driven transmission, and data calculation, efficient data processing of scalar, vector and tensor, and multi-address broadcasting.

In terms of storage, Vision 2.0 is equipped with a total of 4 HBM2E on-chip memory chips, with a high configuration that supports 64GB of memory and a bandwidth of 1.8TB/s. It is the first product in China to support the world's most advanced storage HBM2E and single-chip 64 GB memory.

6 interconnection ports between LARE cards, each port is 50GB/s bidirectional, and the total cluster interconnection bandwidth is 300GB/s.

Yunsui T21\T20 training products are OAM standard modules and full-height and full-length PCI-E boards.

Judging from Benchmark data, Yunsui T20 has obvious advantages in image recognition/classification, NLP, target detection, image segmentation, recommendation, etc., compared with the flagships of Friends of Shangci.

Yusuan Topsrider2.0 architecture

TopsRider is a computing and programming platform with independent intellectual property rights of Suoyuan Technology. Through software and hardware collaborative architecture design, it gives full play to the performance of Vision 2.0; based on operator generalization technology and graph optimization strategies, it supports various applications under the mainstream deep learning framework. Class model training; use Horovod distributed training framework and GCU-LARE interconnect technology to cooperate with each other to provide solutions for the efficient operation of ultra-large-scale clusters. The open and upgraded programming model and extensible operator interface provide customized development capabilities for the optimization of customer models.

In terms of training products, the third-generation T30/T31 products of Suiyuan plans to be released in 2023, and the energy efficiency per watt will be increased by 14 times. At the same time upgrade to Matrix3.0.

In terms of inference computing, the i20 inference chip will be released in the second half of 2021, and the i30 will be released in 2023. Its energy efficiency per watt will be increased by 4 times and 16 times respectively.

In addition, on the basis of Topsrider 2.x, it will be further upgraded to Topsrider 3.x, dedicated to the pan-AI ecology.


The AI ​​ecosystem is also a cross-industry and complex ecological competition. Suiyuan's Heterogeneous Computing Eco-Liaoyuan Plan will gather the three major characteristics of the ecology: original innovation, standardization, and ecological co-construction.

The Liaoyuan Project will establish a prosperous and open developer ecology, a healthy and mutually beneficial industrial ecology, and a scientific research ecology of continuous innovation. Build a general heterogeneous computing ecosystem based on artificial intelligence, build a standardized technical system, and jointly build a complete ecosystem to serve Digital China. It will cover deep learning computing, general heterogeneous computing, and visual computing.

Zhao Lidong said that the future development trend of the industry shows that computing power with chips as the core is a must for Industry 4.0 strategists. In the era of Industry 4.0, artificial intelligence is the key driving force. Moore's Law slows down, heterogeneous computing accelerates, based on small chips and packaging technology, artificial intelligence landing will change the Internet and reshape traditional industries. The computing power needs to explode, and computing power has become the basis for the development of artificial intelligence technology. Suiyuan Technology will establish a world-class localized R&D and engineering team, develop domestically-innovated core technologies, and deploy cloud AI + high-end chip dual track. We create the best cloud AI products for data centers and form a complete closed-loop solution for training and reasoning.