LineShine hits 1.54 exaflops CPU
- China’s National Supercomputing Center in Shenzhen unveiled LineShine on April 28, describing an all-CPU exascale system built with domestic chips, storage and networking. (hpcwire.com) - The machine’s published design centers on 20,480 nodes and two 304-core Armv9 LX2 processors per node, for roughly 12.5 million cores. (hpcwire.com) - An arXiv paper submitted April 17 reported 1.2 and 1.0 exaflops on two exascale systems, without naming LineShine. (arxiv.org)
China’s National Supercomputing Center in Shenzhen introduced a new system called LineShine on April 28 and said it will deliver 2 exaflops when fully deployed. The center described the machine as an all-CPU design built with domestic chips, storage and networking, according to comments by Lu Yutong, the center’s director and LineShine’s chief designer. (hpcwire.com) Public reporting around the launch has tied the system to Armv9-based LX2 processors and to China’s effort to build high-end computing systems without foreign accelerators. The reporting also shows a gap between the headline claims circulating online and what has been documented in primary and near-primary sources. (arxiv.org) ### Did Shenzhen actually say LineShine hit 1.54 exaflops? The April 28 HPCwire report said Shenzhen’s supercomputing center introduced LineShine as a machine that “will deliver 2 Exaflops of performance when it ships.” That wording describes a target configuration, not a benchmark result already posted by the center. The primary technical paper most often cited in secondary coverage does not mention a 1.54-exaflop LineShine result in the material surfaced here. The arXiv paper “Breaking the Training Barrier of Billion-Parameter Universal Machine Learning Interatomic Potentials,” submitted on April 17, said code deployed across two exascale supercomputers reached 1.2 and 1.0 exaflops in single precision at more than 90% parallel efficiency. (hpcwire.com) The abstract page lists Yutong Lu among the authors, but the abstract does not identify those two systems as LineShine. ### What hardware has been described in public? HPCwire reported that the full machine consists of 20,480 computing nodes, with each node using two Armv9-based LX2 processors. (hpcwire.com) The same report said each processor has two compute dies totaling 304 cores, eight on-package HBM stacks with 32 GB capacity and 4 TB/s aggregate bandwidth, plus off-package DDR memory. Jon Peddie Research reported on May 4 that LineShine uses Chinese-developed LX2 processors and a proprietary LingQi interconnect. That report said the full system includes 47,000 CPUs across 92 compute cabinets, a 1 million-port interconnect, 36 networking cabinets, 67 storage cabinets, 428 storage nodes, 10 TB/s of storage bandwidth and 650 PB of storage capacity. (arxiv.org) ### Where does the Huawei connection come from? Jon Peddie Research identified the chips as Huawei LX2 Armv9 processors. HPCwire, by contrast, referred to Huawei Kunpeng servers in the first phase and described the broader system as homegrown, but did not state outright in the lines reviewed that Huawei designed the LX2 chip used in the full machine. (hpcwire.com) That leaves the Huawei attribution unevenly documented in the sources reviewed here. Secondary reports have repeated the Huawei claim, but the stronger sourcing available in this research supports saying the system is Chinese-built, CPU-only and based on LX2 Armv9 processors, while noting that some industry coverage attributes those processors to Huawei. (jonpeddie.com) ### Why does a CPU-only design stand out? The TOP500 comparison in HPCwire and Jon Peddie Research shows why the architecture drew attention. Both reports contrasted LineShine with Lawrence Livermore National Laboratory’s El Capitan, which they said leads the TOP500 with 1.8 exaflops of proven Linpack performance and uses AMD GPU accelerators. (jonpeddie.com) Jon Peddie Research said GPU-based systems concentrate parallel math in accelerators, while an all-CPU design can run a broader mix of scientific and AI workloads on the same hardware. That is the clearest sourced explanation in the material reviewed for why Shenzhen’s design differs from current flagship supercomputers. (jonpeddie.com) ### What can be said now with confidence? April 28 is the clearest public milestone: that is when LineShine was introduced as a 2-exaflop all-CPU system by Shenzhen’s supercomputing center. The best-supported public configuration points to 20,480 nodes, two 304-core LX2 processors per node, Chinese networking and storage, and a phased buildout that began with 100 Huawei Kunpeng servers. (hpcwire.com) The next hard datapoint will be a disclosed benchmark or deployment update from the National Supercomputing Center in Shenzhen, Lu Yutong or a peer-reviewed paper that explicitly names LineShine and the workload used. (jonpeddie.com) Until then, the documented public claim is a 2-exaflop target system, while the separately documented arXiv result is 1.2 and 1.0 exaflops on two unnamed exascale machines. (hpcwire.com)