Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
China Hardware

China's Secretive Sunway Pro CPU Quadruples Performance Over Its Predecessor (tomshardware.com) 73

An anonymous reader shares a report: Earlier this year, the National Supercomputing Center in Wuxi (an entity blacklisted in the U.S.) launched its new supercomputer based on the enhanced China-designed Sunway SW26010 Pro processors with 384 cores. Sunway's SW26010 Pro CPU not only packs more cores than its non-Pro SW26010 predecessor, but it more than quadrupled FP64 compute throughput due to microarchitectural and system architecture improvements, according to Chips and Cheese. However, while the manycore CPU is good on paper, it has several performance bottlenecks.

The first details of the manycore Sunway SW26010 Pro CPU and supercomputers that use it emerged back in 2021. Now, the company has showcased actual processors and disclosed more details about their architecture and design, which represent a significant leap in performance, recently at SC23. The new CPU is expected to enable China to build high-performance supercomputers based entirely on domestically developed processors. Each Sunway SW26010 Pro has a maximum FP64 throughput of 13.8 TFLOPS, which is massive. For comparison, AMD's 96-core EPYC 9654 has a peak FP64 performance of around 5.4 TFLOPS.

The SW26010 Pro is an evolution of the original SW26010, so it maintains the foundational architecture of its predecessor but introduces several key enhancements. The new SW26010 Pro processor is based on an all-new proprietary 64-bit RISC architecture and packs six core groups (CG) and a protocol processing unit (PPU). Each CG integrates 64 2-wide compute processing elements (CPEs) featuring a 512-bit vector engine as well as 256 KB of fast local store (scratchpad cache) for data and 16 KB for instructions; one management processing element (MPE), which is a superscalar out-of-order core with a vector engine, 32 KB/32 KB L1 instruction/data cache, 256 KB L2 cache; and a 128-bit DDR4-3200 memory interface.

This discussion has been archived. No new comments can be posted.

China's Secretive Sunway Pro CPU Quadruples Performance Over Its Predecessor

Comments Filter:
  • China is not known for honesty, especially when it comes to technology.
    • by MrNaz ( 730548 ) on Friday November 24, 2023 @08:47PM (#64029671) Homepage

      Given that this CPU is entirely intended for internal use and not for sale to Western markets, there's little incentive to lie.

      But y'know, keep believing that America is the only country in the world capable of doing anything of value if that makes you feel all warm and fuzzy while you lie in bed cuddling your guns.

      • Re: (Score:2, Troll)

        by DrMrLordX ( 559371 )

        Along with South Korea, Japan, Germany, um maybe Great Britain, Israel, and others.

      • by dknj ( 441802 )

        doing anything of value

        they are still on DDR4. wake me up when they reach big boy bus speeds

      • by Entrope ( 68843 ) on Friday November 24, 2023 @09:23PM (#64029701) Homepage

        Look at the memory architecture -- sharing 128 bits of DDR4 at just 3200 Mt/s across 64 cores -- and try telling us that it's a serious design. Or the per-core scratchpad RAM: that's a huge asymmetry in the design that is almost incompatible with modern preemption and scheduling algorithms because the contents of that RAM would need to be swapped out along with register state.

        It looks like it's a design optimized for a particularly silly benchmark result at the expense of general usefulness.

        • by _merlin ( 160982 ) on Friday November 24, 2023 @11:54PM (#64029949) Homepage Journal

          It's a fancy multi-core DSP, like the IBM/Sony Cell from the PlayStation 3. It suffers from the same issues in that you need to be clever about how you DMA data in and out of the cores' scratchpad memory. It's designed specifically for running supercomputing workloads. It isn't a general-purpose CPU.

          • by Entrope ( 68843 )

            This kind of architecture requires a custom OS and at least custom libraries, if not fully custom application code as well. The external memory bottleneck is a limit for a lot of HPC applications as well. I don't think I was being too harsh at all.

          • you need to be clever about how you DMA data in and out of the cores' scratchpad memory

            That's what they said about Itanium too -- it will be faster you just need compilers to understand and optimize for ILP.

            I think your later statement ought to be the lead: it's not a general purpose CPU. The domain of problems for which you can be clever and for which it's worth it to do so is extremely narrow.

        • by Anonymous Coward

          Look at the memory architecture -- sharing 128 bits of DDR4 at just 3200 Mt/s across 64 cores -- and try telling us that it's a serious design. Or the per-core scratchpad RAM: that's a huge asymmetry in the design that is almost incompatible with modern preemption and scheduling algorithms because the contents of that RAM would need to be swapped out along with register state.

          You are being way too harsh. This is roughly identical memory bandwidth available to state of the art Pentium III processors over two decades ago. On second thought it would be substantially less due to unavoidable queuing / concurrency constraints whose name escapes me at the moment. It's the think that repairs itself as the core count increases like if it was 6400 cores the effect would go away.

        • It's also the same type of mishmash of processing elements that made the Cell processor such a bitch of a device to program. Sure, in theory the performance was amazing, but almost no-one ever managed to get anything like that performance out of it in general use.
          • But then again, these type of supercomputers aren't used for 'general use'.
            • But then again, these type of supercomputers aren't used for 'general use'.

              Unless they're used for one problem and then thrown away, you will want them to be reconfigurable for another problem.

        • Compared to an AMD Epyc 9654, it has 6x128bit memory channels, the AMD has 12x64 bit channels. Same number of bits there, only difference is DDR4 vs DDR5.

          It's not a desktop CPU, it's designed for supercomputers. In 2018 its predecessor was used in the fastest one in the world.

          • by Entrope ( 68843 )

            DDR5 typically has 50% higher transfer rates than DDR4, and the Epyc has a much healthier thread-to-RAM ratio than this CPU. Or compared to a four-year-old Threadripper 3970X (high-end desktop) processor, each set of 64 cores in this CPU has the same off-chip RAM interface, the same L1 and L2 cache size, no on-chip L3 cache, the same number of threads, etc.

            This CPU's architecture is only good for a very narrow slice of benchmark results -- which is fine as long as the applications look like those benchmark

            • You're right, this CPU is designed for HPC.
              It doesn't need to be responsive to user input
              It doesn't need to be good at context switching

              • by Entrope ( 68843 )

                It's not good at anything as broad as "HPC". It's not even clear that it's good at large linear algebra problems, which is what an awful lot of HPC boils down to. It's good at a small fraction of HPC problems that have very low memory-bandwidth needs (and this very localized data flow), small inner loop code sizes, and enough specialization that the whole software stack can be optimized for a CPU that depends on per-core address spaces.

      • there's little incentive to lie.

        Substantiate. Resource distribution to various "scientific" institutions in the communist china is very much related to the size of the boasts they make, and "secrecy of development" means there is no real peer review or even competition.

        That creates the perfect incentive to lie a lot, moreover so because the size of the lie is a coverup in itself.

      • by gtall ( 79522 )

        They have every incentive to lie to avoid the baleful eye of the CCP. This is what happens to authoritarian regimes that constantly punish failure to produce or think the "right way". People catch the drift and from that point onward you can expect a constant stream of lies floating up to the "authorities".

      • Given that this CPU is entirely intended for internal use and not for sale to Western markets, there's little incentive to lie.

        How many incentives do you need described for you? It could be propaganda for their own citizens to make them feel better, it could be propaganda to convince ours that the trade restrictions are worthless, they could just be trying not to get killed for failing Xi...

      • by hawk ( 1151 )

        >Given that this CPU is entirely intended for internal use and not for
        >sale to Western markets, there's little incentive to lie.

        Gee, if only we had a comparable entity with a few decades of data with which we could compare . . .

        Oh, wait; we do!

        A significant part of why the USSR imploded decades before we expected is that, while we knew they were lying to themselves as well the world about things like production, we failed to fathom the sheer *degree* to which they were lying to themselves and believin

        • I love it how Americans think they understand the USSR example and have learned from it, but China haven't.

          Given that the US government is essentially bankrupt and China isn't, I would say America has more to learn from the USSR example than China does.

    • by tlhIngan ( 30335 )

      China is not known for honesty, especially when it comes to technology.

      The only good part about China is they're also fiercely capitalistic. I'm sure if this processor is any good, you will find it at the Shenzhen market soon enough and someone will export it to the US for testing.

      Someone will likely get their hands on it and start selling boards soon enough. They rip off their own stuff all the time.

      • That train is gone - China is now less capitalist than it was 3 years ago and much less than 10 years ago. It is sliding back into Maoist type of totalitarianism, but with a telescreen in your pocket.

        • It is sliding back into Maoist type of totalitarianism

          Only because you only ever consume Western deregulation propaganda disguised as economic analysis.

          The situation on the ground in China is more complex than that.

        • I try to fix it for you, but I guess you are one of the stupid idiots who do not get it:
          a) capitalist -- a market and social system
          b) totalitarianism -- a form of government, or lack there of: a kind of ruling a country, or "political system"

          Both have nothing to with each other, as they are complete different axes of the coordinate system.

          Hint: Nazi Germany was highly totalitarian, and fascist and capitalist.

          • You cannot tell your ass from a hole in the ground, nevermind discuss coordinate systems. Totalitarianism is a system, whereby the whole of society and economy is under total control of the government.

            If you trully believe that in Nazi Germany (or China, or Russia) any "capitalist" could make independent decisions without approval from the government, then this shows only your total ignorance of the subject.

            Given your low id, it is too late to tell you to go get an education. Go cut the grass (or plow the s

            • Totalitarianism is a system, whereby the whole of society and economy is under total control of the government.
              No. First of all it is a system were a single ruler or a group of people executes complete control without constrains. Usually in an evil manner.

              However even a good doer, benevolent king, is a totalitarian ruler. Weather he influences economics or not has nothing to do with it.

              If you trully believe that in Nazi Germany (or China, or Russia) any "capitalist" could make independent decisions without

    • They are not lying: they are just reporting the increased performance of their parallel-vectorized-superscalar NOP processor....
    • because only a merkins can steal technology from aliens in ufo's.
  • Ship me a school.

  • by jenningsthecat ( 1525947 ) on Saturday November 25, 2023 @12:13AM (#64029979)

    It seems that the Tom's Hardware article was mostly regurgitating info from the Chips and Cheese article. Initially that article looked to me like a first-person set of benchmark results, but after reading it more closely I think that Chips and Cheese was just regurgitating and commenting on test results published by the Chinese.

    If that's the case, then I don't trust the reporting. China has a history of 'creatively representing' its avowed accomplishments in many fields, including semiconductors.

    • Yeah, I think you'll find these chips aren't available for the mass-market. So why you think there should be independent benchmarks is beyond me.
      • Yeah, I think you'll find these chips aren't available for the mass-market. So why you think there should be independent benchmarks is beyond me.

        I thought perhaps the Chinese might have sent one to the West for benchmarking in order to blow their own horn, so to speak. Hard proof that they're actually capable of what they're claiming would do wonders for their propaganda efforts.

        • I thought perhaps the Chinese might have sent one to the West for benchmarking

          Again, why?

          It's arrogant to assume that the West represents "hard proof", when even Linus Torvalds complains that companies like Intel fudge their benchmarks in order to push a narrative for certain features.

    • by AmiMoJo ( 196126 )

      FWIW the number for the AMD part is the theoretical max from their datasheet. I would assume that the number for the Chinese part is also the theoretical maximum, not the result of a benchmark.

  • [joke]So what, it runs at 4MHz only in benchmarks?[/joke]
  • by kriston ( 7886 ) on Saturday November 25, 2023 @02:41PM (#64031001) Homepage Journal

    This chip is actually a DEC Alpha 21164 copy with evolutionary enhancements. It's not an original part developed from the ground up like the article claims.

"One lawyer can steal more than a hundred men with guns." -- The Godfather

Working...