Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Supercomputing Hardware

China Switching To Home-Grown Chips For Supercomputers 198

rubycodez writes "The Tianhe-1A system will be the last Chinese supercomputer to use imported Intel and AMD processors. By years end, China's own 64 bit MIPS-compatible 65nm 8-core 1GHz version of the Godsen (Longsoon family) processors will be used, including 10,000 of them for the 'Dawning 6000' supercomputer. Yes, the chips can and usually do run GNU/Linux, but also can run FreeBSD, OpenBSD, and NetBSD."
This discussion has been archived. No new comments can be posted.

China Switching To Home-Grown Chips For Supercomputers

Comments Filter:
  • That's silly. They're trying to build a supercomputer out of MIPS chips. That'll never work...

    Speaking of which, it does make me wonder about all this fuss over 64 bit ARM chips for datacentres. There are already high performance, low power 64 bit MIPS chips and have been for years. They're well proven, have good compiler support, cheaply licensable, low power (perhaps not quite as los as ARM?), have standard 64 bit modes and so on.

    • There are high performance, low power 64 bit MIPS for sale???

      Where would I buy some of those?
      • Where would I buy some of those?

        You can license the core.

      • Cavium will sell you a 32-core in-order R4K derivative with hardware-accelerated network and cryptography in the 50W range, with clock speeds around 1.5GHz iirc. NetLogic will sell you an 8-core 4-way multithreaded out-of-order R4K derivative at 2GHz, also with hardware-accelerated network and crypto, in a similar power footprint. Both support the requisite buzzwords, including DDR3, 10gbe, and others.
    • That's silly. They're trying to build a supercomputer out of MIPS chips. That'll never work...

      Silly? Perhaps, for a given value of "silly". But "never work"? Of course it can work - it isn't even going to be hard, all the technology and code already exists.

    • by bsDaemon ( 87307 )

      In addition, most of the Computer Organization and Design books I've seen use the MIPS instruction set to teach assembly and machine code and diagram the processor fairly well, so the architecture should be understood by a good number of computer scientists and therefore coders. You would think that would give MIPS an advantage, although the general attitude around them seems to be that MIPS died with Irix. I don't recall ever seeing a non-SGI MIPS computer on the market, but I haven't really been looking

      • I don't recall ever seeing a non-SGI MIPS computer on the market, but I haven't really been looking too hard, honestly.

        You don't recall seeing WRT54G?

      • MIPS licenses its instruction set, which means that, like ARM, there are a few makers of MIPS-compatible chips. A few are in a similar business to ARM, but mostly they're clustered at the opposite end of the spectrum. If you want a 64-core processor, about the only ones you're likely to find on the market at the moment are from small MIPS licensees. If you want a 2048 processor machine that can fit under your desk, a MIPS derivative is your only option. MIPS is a niche player at the moment, but it's doi
      • by nxtw ( 866177 )

        I don't recall ever seeing a non-SGI MIPS computer on the market

        PlayStation 2, PlayStation Portable, many consumer network devices (including, but not limited to, many Linksys and Buffalo routers), networked video players and Blu-ray players (using Sigma chipsets)...

      • by tbuskey ( 135499 )

        DEC had a number of MIPS based computers that ran Ultrix.

        DECstation 3100, 5100 workstations.
        The 5900 (and 5800?) mini computer.

        Ultrix ran on Vaxen and MIPS systems.
        The Alpha chip and OSF/Digital Unix/Tru64 replaced MIPS and were much faster.

        I think SGI had Indy and Indigo2 systems at the time of the DECstations. They may have preceeded the purchase of MIPS by SGI

      • SiCortex made MIPS based supercomputers, Tandem's NONStop were MIPS based Unix mainframes (competed head to head with IBM's mainframes). other MIPS computer makers included NEC, Pyramid Technology, and Siemens Nixdorf.
    • Re:Silly. (Score:5, Informative)

      by TheRaven64 ( 641858 ) on Saturday March 12, 2011 @08:44AM (#35463314) Journal

      Speaking of which, it does make me wonder about all this fuss over 64 bit ARM chips for datacentres. There are already high performance, low power 64 bit MIPS chips and have been for years

      Not really. Low power MIPS64 chips use 10-20W. Low power ARM chips use under 1W. They're both low power within their various domains, but the ARM chips get a lot more performance per Watt. Most of the time, the MIPS chips are more interesting for supercomputing, because they have better floating point, better interconnect (there's a lot of experience floating around building large MIPS systems, a lot from ex-SGI people), better toolchains (MIPS has been in HPC so long that it's a standard target for compiler in that market), and better overall performance.

      The ARM chips are interesting because a lot of server tasks are not CPU-bound. You can stick 64 ARM SoCs, each with enough flash and RAM to run a small business server, in a 1U case and not worry about heat. You can connect it to a big SAN for storage of data (just put the OS and apps on the flash). Idle power usage can be a few mW per server, power usage under load is basically the power usage of the SAN - the rest of the hardware is adding 1W or so.

      It's a mistake to confuse the server and HPC markets. They have very different requirements.

      • by renoX ( 11677 )

        Note that something to add is that until very recently ARM were 32bit only, which is not very good for datacenters.

        They added a kludge on the ARM ISA (not as eleguant as the MIPS64 ISA) so now it isn't an issue anymore..

      • "low power" quad-core 65nm 1ghz MIPS64 chips use 10 watts; 90nm, 20 watts. if you go to 28nm and stay at 1ghz, you divide by four - so that's 10/4 = 2.5 watts.

        also, there are two different configurations for 65nm done by TSMC: one is high-performance (lower cost, 20 masks) and the other is lower-power (slightly higher cost, 32 masks). the lower-power CMOS one was only invented recently, so this is why you often see e.g. Broadcom Network / Server MIPS64 Quad-Core 1ghz 65nm CPUs consuming 10-20 watts. with

    • by spinkham ( 56603 )

      If you were as heavily involved in attacking the computer systems of other countries as China is, you would want to make sure that you control as much of your own systems as possible.

    • Wrong. Dead Wrong. (Score:5, Informative)

      by lkcl ( 517947 ) <lkcl@lkcl.net> on Saturday March 12, 2011 @08:56AM (#35463378) Homepage

      you are completely wrong. this processor has over 200 x86 emulation instructions, allowing it to run x86 code with only a 30% performance penalty, under qemu. it also has two 256-bit vector pipelines that provide SIMD floating-point operations so powerful that a single 1ghz core can do 1080p at over 100 frames a second. to claim that "it will never work" in the face of evidence that you simply haven't looked at is ridiculous. look up the specifications on the GS464V, please. also, you are not aware that the Chinese Government has purchased 25% of MIPS, and is working with the MIPS teams in the U.S. to create this processor. this processor *IS* MIPS's high-performance, low-power 64-bit MIPS chip.

      • it also has two 256-bit vector pipelines that provide SIMD floating-point operations so powerful that a single 1ghz core can do 1080p at over 100 frames a second.

        In these modern times, if you are going to be doing lots of SIMD on your HPC, you will replace the 10,000 CPU's with 500 GPU's + 500 CPU's to drive them.

        Its cheaper to buy, and cheaper to operate.

        • Its cheaper to buy, and cheaper to operate.

          And performance dies screaming at the first branch instruction. Yes, GPUs have great throughput, but they suck for large categories of algorithm. If they didn't, then CPUs would have the same performance. They generally lack any branch prediction, so a branch can stall the pipeline completely - if you've got more than one branch every hundred instructions, running it on the GPU won't give you anything like the theoretical maximum throughput. If your threads aren't exactly in lockstep (i.e. if two threa

          • And performance dies screaming at the first branch instruction.

            You cant do separate branching *at all* between the multiple scalers within a SIMD vector. All the scalers have the same operations performed on them.

            You seem to be confused about why the negative performance of branching matters on GPU's... its not because it impacts their SIMD capabilities.. because it doesnt.. its because it impacts their CPU-like "GPGPU" capabilities... which means... what I said is 100% correct:

            If you are doing heavy SIMD work, get a pile of GPU's.

            • You cant do separate branching *at all* between the multiple scalers within a SIMD vector. All the scalers have the same operations performed on them.

              No, but you can do branching between each set of operations. If you're doing a matrix operation, then you can do a couple of SIMD operation on a row, then a branch based on the result. This is pretty fast on most CPUs, it's painfully slow on a GPU.

              You seem to be confused about why the negative performance of branching matters on GPU's.

              No, I'm not. One of the things I work on is a GPGPU compiler for HPC, so I'm intimately familiar with their strengths and weaknesses and when it makes sense to offload work to them from the CPU.

              its not because it impacts their SIMD capabilities.. because it doesnt.. its because it impacts their CPU-like "GPGPU" capabilities... which means... what I said is 100% correct

              I never said it did. I said that it affects their ability to hand

              • No, but you can do branching between each set of operations. If you're doing a matrix operation, then you can do a couple of SIMD operation on a row, then a branch based on the result. This is pretty fast on most CPUs, it's painfully slow on a GPU.

                You are doing it wrong. The branching is only one of your issues. You are preventing coalesced reads, as well as causing bank conflicts in shared memory.

                What you are describing is effectively "gimped" from the start. You have a single matrix but want to leverage instructions which operate on multiple data. Sure, the matrix is made up of multiple data.. but what you should be doing is operating on many matrices (hundreds.. thousands even) at the same time... Certainly you know the difference between AoS (

                • Wow, you really have no clue. If your problems are that loosely coupled, then you don't need to do SIMD at all, just solve each matrix in a separate process on separate CPU. For typical applications where supercomputers are used the problem is to solve a single, huge problem, not a gazillion small ones. That is when parallelism becomes hard, otherwise you don't need a supercomputer at all.

    • by Bert64 ( 520050 )

      Why not? SGI were building supercomputers from MIPS chips 10 years ago...

    • Was that supposed to be sarcasm? Before SGI was destroyed by Rick Belluzzo, it made plenty of high-performance clustered computers. Considering that nost non-x86 CPU architectures development was cancelled as a result of "business" decisions, there is no reason to expect that MIPS-based computer will be somehow worse than other architectures, as long as development continues.

      After all x86, taken on its own, is a terrible architecture. However continuous development allowed Intel and AMD to implement it effi

  • by slonik ( 108174 ) on Saturday March 12, 2011 @08:31AM (#35463224)
    The processor family is called Loongson [wikipedia.org] and not "LongSoon" as summary says. But the typo is funny in its own way.
    • There'll be a Beowulf Cluster of these along soon!

    • It's actually written with two Chinese characters. (If you have the right fonts installed, it's . If you don't, they won't appear). The literal translation is "dragon chip". (It sounds cool in Chinese because it is pronounced identically with dragon heart. "Heart of dragon inside" sounds way cooler than "Intel inside".) There are multiple different systems for Romanizing chinese characters, but in HanYu PinYin [wikipedia.org], which is now accepted as the most standard, and is taught throughout China, it would be lóng
    • by Thing 1 ( 178996 )

      The processor family is called Loongson [wikipedia.org] and not "LongSoon" as summary says. But the typo is funny in its own way.

      Yeah, your new CPU is just a spider bite away...

    • My apologies for two bad typos, though I typed both in correctly in tags, also wrote Godsen for article text not Godson.
  • by stox ( 131684 ) on Saturday March 12, 2011 @08:49AM (#35463340) Homepage

    I wonder how well these chips compare to the R16000's?

    • Modern commercial MIPS chips have relatively little in common with SGIMIPS. Apart from the totally dissimilar chipset, firmware, and peripherals, SGI used big endian chips (mipseb) whereas most current commercial implementations are little-endian (mipsel.) I doubt anyone will ever get IRIX running on non-SGI hardware unless SGI releases a massive amount of source code and documentation that they have so far shown no inclination to release.
  • The Japanese 10 petaflops-scale K computer in Kobe uses Sparc-compatible cpus from Fujitsu. Sounds like a good idea if you want to build know-how, not just a machine.

    • It also allows you greater flexibility in chip design. The Japanese are still convinced that vector processors are still the way to go. The earth simulator had a lot of Japanese-designed vector CPUs and the K computer is no different, it has 2x as many SIMD units per core as the Intel/AMD CPUs. There are lots of benefits to using the vector CPUs in parallel computing, but the problem is that there is very little demand in the personal/corporate world for them(outside of a few specific applications). By
    • Although with Oracle deprecating Sun with all their might, it's hard to say what kind of future SPARC has. Clearly China has seen the same wall I have.

      • Why would Oracle's acquisition of Sun have anything to do with Fujitsu's SPARC development? Quite a few systems from Sun over the last few years have contained rebadged Fujitsu SPARC64 chips, but Sun is certainly not Fujitsu's only (or even largest) SPARC customer. Oh, and Oracle has extended the UltraSPARC Tx series roadmap beyond the last Sun one by quite a way. They killed Rock, but Rock was a processor with no market segment. The Tx series are still being sold by Oracle and (given their performance
        • Why would Oracle's acquisition of Sun have anything to do with Fujitsu's SPARC development? Quite a few systems from Sun over the last few years have contained rebadged Fujitsu SPARC64 chips, but Sun is certainly not Fujitsu's only (or even largest) SPARC customer.

          If Slowlaris is deprecated then the demand for SPARC drops sharply. Linux runs on cheaper chips and there's no magical SMP glue in SPARC architecture processors that hasn't been done elsewhere.

  • by lkcl ( 517947 ) <lkcl@lkcl.net> on Saturday March 12, 2011 @09:00AM (#35463394) Homepage

    the article has missed out some important information, which is that they are planning two versions of the CPU. the first is a Quad-Core 65nm, and the second is a 16-core 28nm, which will use the same amount of power (about 12-15 watts). hopefully they will also do a Single-Core 28nm which would be under 1 watt, because at 1ghz the SIMD units are so powerful they can do 1080p at 100 frames per second. really, this CPU design is a game-changer. i've been advocating their use for some time - http://lkcl.net/laptop.html [lkcl.net]

    • Where are real industry benchmarks? If they're advertising it for technical computing, where's speccpu2006? If they're pushing it for commercial workloads, why haven't we seen a TPC-C?
      • Two reasons - they haven't been built yet, and maybe the Chinese have a different set of benchmarks that they use.

  • I would like to buy a small (perhaps 1U) server based on these chips if such a thing exists...

  • I'm sure these are very nice chips, but anyone can do similar, given funding. there are a number of cores available for licensing (like they did with MIPS), and adding vector units is the obvious way to boost your peak flops without blowing your power budget. I guess I don't really see why this merits all the coverage - for instance, what fraction of peak performance can it get on real code (say, a weather or MD simulation, not HPL)?

    the quoted peak gflops/watt for this project are decent, but not much bet

To be or not to be, that is the bottom line.

Working...