Forgot your password?
typodupeerror
Supercomputing Hardware

ARM In Supercomputers — 'Get Ready For the Change' 238

Posted by Soulskill
from the you-and-what-ARMy dept.
An anonymous reader writes "Commodity ARM CPUs are poised to to replace x86 CPUs in modern supercomputers just as commodity x86 CPUs replaced vector CPUs in early supercomputers. An analysis by the EU Mountblanc Project (PDF) (using Nvidia Tegra 2/3, Samsung Exynos 5 & Intel Core i7 CPUs) highlights the suitability and energy efficiency of ARM-based solutions. They finish off by saying, 'Current limitations [are] due to target market condition — not real technological challenges. ... A whole set of ARM server chips is coming — solving most of the limitations identified.'"
This discussion has been archived. No new comments can be posted.

ARM In Supercomputers — 'Get Ready For the Change'

Comments Filter:
  • IMHO - No thanks. (Score:2, Insightful)

    by Anonymous Coward

    PC user, hardcore gamer and programmer here; for me, energy efficiency is a lesser priority than speed in a CPU. Make an ARM CPU compete with an Intel Core i7 2600K, and show me it's overclockable with few issues, and you got my attention.

    • by Stoutlimb (143245) on Saturday May 25, 2013 @11:27PM (#43825231)

      No doubt your CPU would win. But when looking at power/price as well, you'd have to pit your CPU against 50 or so ARM chips in parallel. For some solutions, it may be a far better choice. One size doesn't fit all.

      • by gentryx (759438) * on Sunday May 26, 2013 @01:39AM (#43825661) Homepage Journal

        There is already one line of supercomputers built from embedded hardware: the IBM Blue Gene. Their CPUs are embedded PowerPC [wikipedia.org] cores. That's the reason why those systems typically have an order of magnitude more cores than their x86-based competition.

        Now, the problem with BG is, that not all codes scale well with the number of cores. Especially when you're doing strong scaling (i.e. you fix the problem size, but throw more and more cores on the problem), then the law of Amdahl [wikipedia.org] tells you that it's beneficial to have fewer/faster cores.

        Finally I consider the study to be fundamentally flawed as it compares the OEM prices of consumer-grade embedded chips with retail prices of high-end server chips. This is wrong for so many reasons... you might then throw in the 947 GFLOPS, $500 AMD Radeon 7970 [wikipedia.org], which beats even the ARM SoCs by a margin of 2x (ARM: ~1 GFLOPS/$, AMD Radeon: ~2 GFLOPS/$).

      • I may be wrong here, but I get the impression that the MIPS architecture is much more power efficient than that of the ARM architecture

        If they are going to talk about building up a big iron using CPUs which are of high power efficiency, I reckon the MIPS cpu might be more suitable for this task than one from the ARM camp

        • by julesh (229690) on Sunday May 26, 2013 @03:57AM (#43826015)

          I may be wrong here, but I get the impression that the MIPS architecture is much more power efficient than that of the ARM architecture

          If they are going to talk about building up a big iron using CPUs which are of high power efficiency, I reckon the MIPS cpu might be more suitable for this task than one from the ARM camp

          I don't think it is. Best figures (albeit somewhat out-of-date) I can find for a MIPS-based system is 2GFLOPS/W for a complete 6-core node including memory. ARM Cortex A15 power consumption is a little hard to track down, although it's suggested that a 4-core 1.8GHz configuration (eg Samsung Exynos 5) could run at full speed on 8W (if the power manager let it; the Exynos 5 throttles down when it consumes more than 4W). Performance per GHz/core is about 4GFLOPS, so this system should be able to pull in about 28.8GFLOPS (or twice that if using ARM's "NEON" SIMD system to full advantage). Add in ~2W for 1GB DDR3 SDRAM, and that's 2.9GFLOPS/W. Assuming that the MIPS system I found is not the best available (as the data was from 2009 it certainly seems likely better is available now), the two appear to be roughly comparable.

        • I may be wrong here, but I get the impression that the MIPS architecture is much more power efficient than that of the ARM architecture

          If they are going to talk about building up a big iron using CPUs which are of high power efficiency, I reckon the MIPS cpu might be more suitable for this task than one from the ARM camp

          MIPS is an under invested older but great technology.
          Another historic winner was the DEC Alpha.

          As the folk at Transmeta (and others) demonstrated logic to decode any random ISA and drive a RISC core faster than the old VAX microcode days is very possible. This seems to be the way of modern processors. So ARM/x86/x86_64 ISA almost does not matter except to the compiler and API/ABI folk. If you want to go fast feed your compiler folk well.

          • As the folk at Transmeta (and others) demonstrated logic to decode any random ISA and drive a RISC core faster than the old VAX microcode days is very possible. This seems to be the way of modern processors. So ARM/x86/x86_64 ISA almost does not matter except to the compiler and API/ABI folk. If you want to go fast feed your compiler folk well.

            One of the best ways you can help the compiler folk is with an orthogonal and sensible architecture. Furthermore, consider that generating good code is a problem that must be solved for every language, so starting with a good ISA makes for a lot less work.

        • by Bert64 (520050)

          Another advantage of MIPS is that 64bit MIPS is already mature, having been around since the early 90s... 64bit ARM on the other hand is new and not widely supported yet.

      • THe core i7 might very well still win. Remember that intel is more efficient in computing work per watt, and an Ivy Bridge core i7 3770k uses 77w. If your average arm chip uses 2 watts, that means that ~30 arm chips will still get beaten by the core i7....

      • by gl4ss (559668)

        No doubt your CPU would win. But when looking at power/price as well, you'd have to pit your CPU against 50 or so ARM chips in parallel. For some solutions, it may be a far better choice. One size doesn't fit all.

        50 costs more in silicon than a single x86.

        basically you need a "new generation" of arm chips. but they'll have to compete against a new generation of x86 chips - and remember, x86 chips are priced as they are only because they're fastest you can buy!.

        the thing is, we have been listening to this for years, that in few years arm will take over everything. yet it hasn't.

        instead of supercomputing, I would foresee the lowest tier of rent-a-webservers to move to arm.. what's a better business than renting a mach

        • by Bert64 (520050)

          Alpha used to be the fastest you can buy, and it used to be priced high too...
          ARM is doing what x86 did to the highend risc cpus of the 90s.

          • I for one am happy to see WinTel crumbling at both ends. Windows and X86, each as ugly as the other.

            • by Rockoon (1252108)
              ..and by ugly you mean the greatest (most versatile) addressing modes of any currently produced CPU's?

              The x86 addressing modes are so powerful that they even created an instruction to leverage the addressing generation logic without accessing memory...

              The fact is that neither RISC nor CISC is best, that a hybrid of the two is best. The problem with the RISC camp is that they cant make it hybrid while still being RISC, while the CISC camp hybridized long ago and even remained entirely compatible while do
          • by unixisc (2429386)
            Alpha's high price was due to DEC trying too hard to achieve prized speeds, and thereby having plenty of fallout, resulting in their need to jack up prices on those that did pass their tests. Had DEC gone for different speed bins, instead of just one, they could have priced it lower and sold it to markets which would have happily considered an Alpha, but where price was less critical.
          • by jedidiah (1196)

            No. Alpha anything was priced insanely.

            There have always been cheap x86. It's only the extreme high end that's been rediculous. There has always been a sweet spot with x86 in terms of price and performance.

            Although Alpha does provide a nice example of how performance per core trumps anything else. There were some problems you simply could not solve by throwing lesser CPUs at it no matter how much you might have wanted.

    • Re: (Score:2, Interesting)

      by Anonymous Coward

      architecture is complicated. but in terms of ops per mm^2, or ops per watt, ops per $,
      cycles per useful op, the x86 architecture is a henious pox on the face of the
      earth.

      worse yet, your beloved x86 doesn't even have any source implications, its just
      a useless thing.

      • Re:IMHO - No thanks. (Score:5, Informative)

        by Colonel Korn (1258968) on Saturday May 25, 2013 @11:58PM (#43825327)

        architecture is complicated. but in terms of ops per mm^2, or ops per watt, ops per $,
        cycles per useful op, the x86 architecture is a henious pox on the face of the
        earth.

        worse yet, your beloved x86 doesn't even have any source implications, its just
        a useless thing.

        In TFA's slides 10 and 11, Intel i7 chips are shown to be more efficient in terms of performance per watt than ARM chips. However, they're close to each other and Intel's prices are significantly higher.

      • Useless for what you do. The second performance...not performance per watt...PERFORMANCE becomes an issue..ARM is a steaming pile of shit and you know it. If you're doing anything more than what the above AC said (keep playing soduku, and portal) it can't handle it. How about everyday consumers who need a tablet that can actually do work? A gimp version of windows is not going to get the job done. Some of the Samsung Slate tablets however come with an x86...and are actually fully functional! Can you point t

        • A gimp version of windows is not going to get the job done.

          On the other hand, a Windows version of GIMP does get a lot of jobs done that don't quite need Adobe Photoshop.

          But seriously, the reason Windows RT is "gimped" is because Microsoft has refused to endorse recompiling desktop applications. That's not a failing of ARM, as ARM ran RISC OS on Acorn computers, as much as a power grab by Microsoft.

          Some of the Samsung Slate tablets however come with an x86...and are actually fully functional! Can you point to an ARM tablet that can do everything it can?

          Some ARM tablets run Ubuntu [ubuntu.com]. Other Android tablets run Debian in a chroot, with video out through an X11 server app for Android. These can't run Windows application

    • Re:IMHO - No thanks. (Score:5, Informative)

      by king neckbeard (1801738) on Saturday May 25, 2013 @11:53PM (#43825311)
      You aren't operating in the supercomputing market. There, what matters is the how much processing you can get for how much money. You can always buy more chips, and power usage and cooling are both signficant factors. That's why x86 became dominant in that space. It was cheaper to buy a bunch of x86 chips than to buy fewer POWER chips. In terms of computing power, a POWER7 will eat your i7 for breakfast, but they are ungodly expensive.
      • by dbIII (701233)
        It was a two week process to attempt to buy a single low end machine with one of those things to see if it was viable for a paticular task - two weeks getting my companies wallet weighed by a slimy bastard that made used car salesmen look like saints and a lot of veiled comments that may have been about kickbacks. In the end the price was more than that of four gold plated IBM Xeon systems of similar clockspeed or about double that in whitebox systems. Sounds like you need a black budget immune from the e
      • ...but also reliability (because supercomputers are really large and one failed node will generally crash the whole job, thereby wasting gazillions of core hours; that's one reason why SC centers buy expensive Nvidia Tesla hardware instead of the cheaper GeForce series) and IO and memory bandwidth and finally integration density. That one Intel chip can be more tightly integrated as it won't generate as much excess heat per GFLOPS (according to TFA...).
    • by XaXXon (202882)

      Why did you even say this? "PC users" aren't even mentioned in this article. This article is about supercomputers where the workloads are by virtual definition extremely parallel and the restrictions are around price and power consumption, not "FPS on a single game".

    • Re:IMHO - No thanks. (Score:5, Interesting)

      by symbolset (646467) * on Sunday May 26, 2013 @01:14AM (#43825595) Journal
      The problem you have is the software tools you use sap the power of the hardware. Windows is engineered to consume cycles to drive their need for recurrent license fees. Try a different OS that doesn't have this handicap and you'll find the full power of the equipment is available.
      • Re:IMHO - No thanks. (Score:4, Informative)

        by aztracker1 (702135) on Sunday May 26, 2013 @02:48AM (#43825861) Homepage
        The last two times I ran Linux on my desktop I ran into issues that weren't impossible to overcome, just a pain in the ass to deal with... I had a desktop with two graphics cards in sli, and two monitors.. getting them both working in 2006 was a pain, I know that was seven years ago, but still... far harder than it should have been.. in 2007, my laptop was running fine, upgraded to the latest ubuntu, nothing but problems.. In the first case, XP/Vista were less trouble, in the second, Win7 RC1 ran better... I also ran PC-BSD for a month, which was probably the nicest experience I've had with something outside win/osx on my main desktop, but still had issues with virtual machines that was a no-go.

        Given, my experiences are pretty dated, and things have gotten better... for me, linux is on the server(s) or in a virtual machine... every time I've tried to make it my primary OS has been met with heartache and pain. I replaced my main desktop a couple months ago, and tried a few Linux variants.. The first time, I installed on my SSD, then when I plugged in my other hard drives, it still booted, but an update to Grub screwed things up and it wouldn't boot any longer. This was after 3 hours of time to get my displays working properly.... I wasn't willing to spend another day on the issue, so back to Windows I went. I really like Linux.. and I want to make it my primary desktop, but I don't have extra hours and days to tinker with problems an over-the-wire update causes... let alone the initial setup time which I really felt was unreasonable.

        I've considered putting it as my primary on my macbook, but similar to windows, the environment pretty much works out of the box, and brew takes things a long way towards how I want it to work. Linux is close to 20 years old.. and still seems to be more crusty for desktop users than windows was a decade and a half ago in a lot of ways. In the end, I think Android may be a better desktop interface than what's currently on offer from most of the desktop bases in the Linux community, which is just plain sad... I really hope something good comes out of it all, I don't like being tethered to Windows or OSX... I don't like the constraints... but they work, with far fewer issues... the biggest ones being security related... I think that Windows is getting secure faster than Linux is getting friendlier, or at least easier to get up and running with.
        • by 0123456 (636235)

          I had a desktop with two graphics cards in sli, and two monitors

          Given SLI barely works in Windows, expecting it to work in Linux was optimistic. I recently booted up a Linux Mint DVD on my laptop to try it out and... everything just works. Even using the 'recovery partition' to reinstall Windows on there takes over three hours, reboots about thirty times and breaks with barely decipherable and completely misleading error messages if you installed a hard drive larger than the one that came with it.

          Linux is close to 20 years old..

          And the BSD core in MacOS is close to 40 years old.

          Android would m

          • Yeah yeah you had no problems therefore they don't exist. I wish Linux advocates would be more honest about its flaws. I think it's great but it's nowhere near perfect. I swapped a Mint hard drive from another machine into this one and it works flawlessly which Windows most certainly wouldn't, however when I put Ubuntu on that other machine it was a nightmare.

      • Got any evidence for that claim? here [phoronix.com] are some benchmarks that suggest gaming performance is the same (which is what you would expect since the OS isn't participating much, except through the graphics drivers).
    • Far more games are played on ARM cpus than X86 CPUs these days. Of course the takeover started at the bottom end with Snake, and moved on through Angry Birds etc., it's only a matter of time before ARM takes over the hard core gamers too. It's more a matter of having a platform with big screen and interesting controllers. ARM CPUs are already up to the task of running such systems.

    • Your comment is off-topic. Nobody cares about your gaming machine and your desktop. Have you read the article? It is about HPC, you know these machines which are simulating global warming, nuclear weapons, etc. It is talking about entire rooms filled with dense compact racks of CPUs and memory and these are having a super high electricity bill to pay each month and they actually care about energy efficiency which may mean more processing power for the same price. Overclocking your gaming machine isn't HPC.
  • by gman003 (1693318) on Saturday May 25, 2013 @11:31PM (#43825251)

    Most of the actual processing power in current supercomputers comes from GPUs, not CPUs. There are exceptions (that all-SPARC Japanese one, or a few Cell-based ones), but they're just that, exceptions.

    So sure, replace the Xeons and Opterons with Cortex-A15s. Doesn't really change much.

    What might be interesting is a GPU-heavy SoC - some light CPU cores on the die of a supercomputer-class GPU. I have heard Nvidia is working on such (using Tegra CPUs and Tesla GPUs), and I would not be surprised if AMD is as well, although they'd be using one of their x86 cores for it (probably Bulldozer - damn thing was practically built for heavily-virtualized servers, not much different from supercomputers).

    • by Victor Liu (645343) on Saturday May 25, 2013 @11:52PM (#43825303) Homepage
      As someone who does heavy duty scientific computing, I wouldn't say that "most" of the actual process power is in GPUs. They are certainly more powerful at certain tasks, but most applications run are legacy code, and most algorithms require substantial reworking to get them to run with reasonable performance on a GPU. Simply put, GPU for supercomputing is not quite a mature technology yet. I am personally not too interested in coding for GPUs simply because the code is not portable enough yet, and by the time the technology might be mature, there might be a new wave of technology (like ARM) that could be easier to work with.
      • by KiloByte (825081) on Sunday May 26, 2013 @12:09AM (#43825373)

        Also, a lot of algorithms, perhaps even most, rely on branching, which is something GPUs suck at. And only some can be reasonably rewritten in a branchless way.

        • by ThePeices (635180) on Sunday May 26, 2013 @02:36AM (#43825823)

          Also, a lot of algorithms, perhaps even most, rely on branching, which is something GPUs suck at. And only some can be reasonably rewritten in a branchless way.

          nonsence, I play Farcry3 on my GPU, and it renders branches just fine thank you very much.

        • by Zo0ok (209803)

          Isn't the ironic thing here, that ARM is also not very good at branching? No branch prediction - that at least used to be the case.

          • by tepples (727027)
            ARM has predication: execute or don't execute a particular instruction based on the result of a previous instruction. It's like branching past one instruction at a time, and it doesn't stall the pipeline.
            • That advantage goes away if your core is superscalar -- you still have issues with branching and not keeping the queue full. Some versions of x86 superscalar can execute both sides of branches, then discard the results of the branch not taken. There is no reason that an architecture with an ARM instruction set could not do this; but then some of the power-per-watt benefits would be leveled out.
      • by XaXXon (202882)

        It really doesn't seem like portability should be a huge goal for writing code for top-100 supercomputers. The cost of the computer would dwarf (or at least be a significant portion of) the cost of developing the software for it. It seems like writing purpose-built software for this type of machine would be desirable.

        If you can cut the cost of the computer in half by doubling the speed of the software, it seems a valid fiscal tradeoff, and the way to do that would be to write it for purpose-built hardware

        • On the point or portability, there's then a distinction of your focus. If you do research on numerical methods, then yes, you would write highly optimized code for a particular machine, as an end in and of itself. I myself am merely a user, and our research group does not have the expertise to write such optimized code. We pay for time on supercomputing clusters, which constantly bring online new machines and retire old ones. Every year our subscription can change, and we are allowed to use resources on dif
        • by JanneM (7445)

          System and numerical libraries and compilers are of course written specifically for the machine. But user-level apps (and a lot of scientific computing uses finished apps) are ported across multiple systems.

          Portability is not as big an issue as it was a generation ago, as most supercomputers basically are Linux machines today, and made to more or less look like a typical Linux installation from a user-application level, with a POSIX API; pthreads, OpenMP and OpenMPI; a standard set of numerical libraries; a

      • Its the same for ARM. Java doesn't run properly yet because of the floating point limitations of ARM.

    • Re: (Score:2, Insightful)

      by Anonymous Coward

      False. According to the Top 500 computer survey from November, 2012 (Category: Accelerator/Co-Processor), 87% of systems are not using any type of GPU co-processor, and 77% of the processing power is coming from the CPU.

      This is, however, a decrease from the June 2012 survey, so GPU is certainly making inroads, but it is not yet the main source of computation.

      http://www.top500.org/statistics/list/

      I still remember when the IBM Blue architecture came out, using embedded PowerPC processors and it was a huge po

    • by Junta (36770) on Sunday May 26, 2013 @12:02AM (#43825353)

      Of the last published top500 list, 7 out of the top 10 had no GPUs. This is a clear indication that while GPU is defintely there, claiming 'Most of the actual processing power' is overstating it a touch. It's particularly telling that there are so few as overwhelming the specific hpl benchmark is one of the key benefits of GPUs. Other benchmarks in more well rounded test suites don't treat GPUs so kindly.

    • by symbolset (646467) * on Sunday May 26, 2013 @12:06AM (#43825365) Journal
      These ARM cores are halfway between the extremely limited GPU cores and the extremely flexible X86 cores. They may be the "happy medium".
      • These ARM cores are halfway between the extremely limited GPU cores and the extremely flexible X86 cores. They may be the "happy medium".

        Not at all. They are much more like slow x86 processors. They can branch just as well, but are much slower and don't have a narrow very high performance sweet spot like GPUs.

        I somewhat expect AMDs new unreleased APUs to be the happy medium. Not as much grunt or memry bandwidth as a discreet GPU, but still some stream processors and much easier to program.

        • by Rockoon (1252108)
          Perhaps. Pretty much any time I am doing some SSE coding I am thinking to myself "wouldnt it be nice of these registers were wider.. why doesn't someone in the x86 market just go ahead and make huge vector registers at least for addition, multiplication, and shifting" and then I realize that that is in fact where the APU's are at right now.. and think to myself "geeze I should be doing OpenCL not this hand-crafted SSE shit"
      • by AmiMoJo (196126) *

        Not really. The main difference between ARM and x86 cores in this application is that ARM has an equally flexible but lower performance ALU. For scientific applications that is a good trade off because performance tends to be mostly dependent on the FPU and on things like network and memory latency.

        In other words it is hard to max out an x86 core constantly in a supercomputer so much of its performance is unused. ARM does away with the bits that are less critical which results in lower power consumption and

  • Questions... (Score:5, Interesting)

    by storkus (179708) on Sunday May 26, 2013 @12:26AM (#43825411)

    As I understand it, Intel still has the advantage in the performance per watt category for general processing and GPUs have better performance per watt IF you can optimize for that specific environment--both things which have been commented to death endlessly by people far more knowledgeable than I.

    However, to me there are at least 3 questions unanswered:

    1. ASICs (and possibly FPGAs): Bitcoin miners and DES breakers are the best known examples. Where is the dividing line between where your operations are specific enough to emply an ASIC vs not specific enough and needing a GPU (or even CPU)? Could further optimization move this line more toward the ASIC?

    2. Huge dies: This has been talked about before, but it seems that, for applications that are embarrassingly parallel, this is clearly where the next revolution will be, with hundreds of cores (at least, and of whatever kind of "core" you want). So when will this stop being vaporware?

    3. But what do we do about all the NON-parallel jobs? If you can't apply an ASIC and you can't break it down, you're still stuck at the basic wall we've been at for around a decade now: where's Moore's (performance) law here? It would seem the only hope is new algorithms: TRUE computer science!

    • by AmiMoJo (196126) *

      In ASICs ARM is an ideal choice because you can built it right into the chip from a reference design. A lot of ASICs feature an 8502 core for management and I/O tasks, but if you needed to execute a more complex application than a simple ARM core running THUMB or even a full 32 bit ARM core would be ideal.

      • by tepples (727027)

        A lot of ASICs feature an 8502 core for management and I/O tasks

        I thought only a Commodore 128-on-a-chip would have an 8502 core [wikipedia.org]. What am I missing?

  • Hopefully this means we should start seeing ARM-using motherboards in an ATX form-factor. The Pi and Beaglebone are nice, but I want something that's eassentially just like a commodity x86 motherboard except it uses ARM.

    • by c0lo (1497653)

      Hopefully this means we should start seeing ARM-using motherboards in an ATX form-factor. The Pi and Beaglebone are nice, but I want something that's eassentially just like a commodity x86 motherboard except it uses ARM.

      Why? Mini-ATX's not good for a commodity MB? 'cause you don't need a high google-fu to find heaps of them.

      • Mini-ATX or Mini-ITX will do fine. I just haven't seen any that have the kinds of things you take for granted on x86 boards. I want an ARM board with SATA ports, PCIe slots, and DIMM (or SODIMM) slots. Is that too hard to produce? I don't see anything like this anywhere.

        • by 0123456 (636235)

          Ditto. I went looking for an ARM board last time I built a home server, but found nothing that could compete in the slightest against a $90 Atom board.

        • by c0lo (1497653)
          Slowly, they [fanlesstech.com] start [fanlesstech.com] to appear [cadianetworks.com].
  • No, they won't. (Score:5, Informative)

    by Dputiger (561114) on Sunday May 26, 2013 @12:45AM (#43825493)

    Current ARM processors may indeed have a role to play in supercomputing, but the advantages this article implies don't exist.

    Go look at performance figures for the Cortex-A15. It's *much* faster than the Cortex-A9. It also draws far more power. There's a reason why ARM's own product literature identifies the Cortex-A15 as a smartphone chip at the high end, but suggests strategies like big.LITTLE for lowering total power consumption. Next year, ARM's Cortex-A57 will start to appear. That'll be a 64-bit chip, it'll be faster than the Cortex-A15, it'll incorporate some further power efficiency improvements, and it'll use more power at peak load.

    That doesn't mean ARM chips are bad -- it means that when it comes to semiconductors and the laws of physics, there are no magic bullets and no such thing as a free lunch.

    http://www.extremetech.com/computing/155941-supercomputing-director-bets-2000-that-we-wont-have-exascale-computing-by-2020 [extremetech.com]

    I'm the author of that story, but I'm discussing a presentation given by one of the US's top supercomputing people. Pay particular attention to this graph:

    http://www.extremetech.com/wp-content/uploads/2013/05/CostPerFlop.png [extremetech.com]

    What it shows is the cost, in energy, of moving data. Keeping data local is essential to keeping power consumption down in a supercomputing environment. That means that smaller, less-efficient cores are a bad fit for environments in which data has to be synchronized across tens of thousands of cores and hundreds of nodes. Now, can you build ARM cores that have higher single-threaded efficiency? Absolutely, yes. But they use more power.

    ARM is going to go into datacenters and supercomputers, but it has no magic powers that guarantee it better outcomes.

    • by Lennie (16154)

      Didn't Intel say that bringing down the cost and improving the performance of the interconnect was the goal of silicon photonics and they are now very close to mass production.

      However I don't know how power efficient it is.

      Could silicon photonics help close that gap ?

    • by Sycraft-fu (314770) on Sunday May 26, 2013 @02:08AM (#43825735)

      Slashdot seems to have lots of ARM fanboys that look at ARM's low power processors and assume that ARM could make processors on par with Intel chips but much more efficient. They seem to think Intel does things poorly, as though they don't spend billions on R&D.

      Of course that would beg the question as to why ARM doesn't and the answer is they can't. The more features you blot on to a chip, the higher the clock speed, and so on, the more power it needs. So you want 64-bit? More power. Bigger memory controller? More power. Heavy hitting vector unit? More power. And so on.

      There's no magic ju ju in ARM designs. They are low power designs, in both sense of the word. Now that's wonderful, we need that for cellphones. You can't be slogging around with a 100 watt chip in a phone or the like. However don't mistake that for meaning that they can keep that low consumption and offer performance equal to the 100 watt chip.

      • by AmiMoJo (196126) *

        The point is that an ARM processor can provide, say, 75% of the performance for 25% of the power compared to x86. You can see it in tablet computers, particularly those running Windows RT or Ubuntu where a direct comparison is possible. Since most of the bottlenecks are not due to processing power but rather disk, RAM, graphics rendering, network etc. you very quickly reach the point of diminishing returns with increasing CPU performance.

        In the case of supercomputers the same things applies. You might want

        • by gl4ss (559668)

          yeah well, we'll see when it does 75% performance for 25% of power. it doesn't. you can't see it in tablets right now. that's what next gen is supposed to fix. but the next gen arm design is going to use more power to get there.

          (incidentally memory access, network etc are all slower on arm and for most supercomputing they do matter)

          it is a bit boring to read these articles now for a decade though. "intel is dead due to arm in two years!! yeehaw!!". they were even more boring back in the day when intel was m

      • by TopSpin (753) on Sunday May 26, 2013 @09:31AM (#43826949) Journal

        There's no magic ju ju in ARM designs.

        The magic ju ju is the ARM business model. There is one trump card ARM holds that precludes Intel from many portable devices; chip makers can build custom SOCs in-house with whatever special circuits they want on the same die. Intel doesn't do that and they don't want to do it; it would mean licencing masks to other manufactures like ARM does. For example, the Apple A5, manufactured by Samsung, includes third party circuits like the Audience EarSmart noise-cancellation processor, among others. It is presently not feasible to imagine Intel handing over masks such that Apple could then contract with some foundry to manufacture custom x86 SOCs. This obviates Intel from many portable use cases.

        That feature of the ARM business model might be very useful to large scale computing. One can imagine integrating a custom high-performance crossbar with an ARM core. Cores on separate dies could then communicate with the lowest possible latency. Using a general purpose ARM core to marshal data to and from high-performance SIMD circuits on the same die is another obvious possibility. A custom cryptography circuit might be hosted the same way.

        Contemporary supercomputers are great aggregations of near-commodity components. However, supercomputing has a long history of custom circuit design and if the need arises for a highly specialized circuit then a designer may decide that integrating with ARM to do the less exotic leg work computing that is always necessary is a good choice.

  • by EmperorOfCanada (1332175) on Sunday May 26, 2013 @01:08AM (#43825581)
    I have long pined for a server with maybe 10 4 core ARM CPUS. Basically my server spends its time serving up web stuff from memory. Each web request needs to do a bit of thinking and then fire the data out the port. Disk IO is not an issue nor is server bandwidth. Quite simply I don't need much CPU but I need many CPUs. A big powerful intel is of less interest.

    Also by breaking up the system into physically separate CPUs I suspect that an interesting memory accessing architecture could be conjured up preventing another potential choke point.
    • by 0123456 (636235)

      Also by breaking up the system into physically separate CPUs I suspect that an interesting memory accessing architecture could be conjured up preventing another potential choke point.

      I suspect you mean it would have to be conjured up, or you'd spend all the time waiting to access RAM on other cores rather than doing anything useful.

    • Its called NUMA [wikipedia.org], and we already have it in the Linux Kernel. By the way it is very cheap these days to pick up a server with 64 or more cores that fits in a 1U / 2 processor server.
      • I would love to know where to get a cheap 64 core 1U server. And I don't mean that in the usual snarky slashdot (I think you're wrong) way but I truly would love to know.
        • Supermicro 1u 64 cores [supermicro.com]. Bunch of other Mobos (some more than 1u) on this page [supermicro.com]. Cheap is relative to the buyer I suppose, but to my (admittedly very large) company these things are rather cheap unless you start stacking them with lots of dense memory.
    • by gl4ss (559668)

      your description sounds like you would benefit more from them having separate memories as well. otherwise a "big powerful intel" would fit the bill, getting higher throughput of requests.

  • Xilinx Zync anybody? (Score:5, Informative)

    by Z00L00K (682162) on Sunday May 26, 2013 @01:18AM (#43825607) Homepage

    Has anybody else seen/considered the Xilinx Zync [xilinx.com]? It's a mix of ARM kernels and FPGA, which could be interesting in supercomputing solutions.

    For anyone willing to tweak around with it there are development boards around like the ZedBoard [zedboard.org] that is priced at US$395. Not the cheapest device around, but for anyone willing to learn more about this interesting chip it is at least not an impossible sum. Xilinx also have the Zynq®-7000 AP SoC ZC702 Evaluation Kit [xilinx.com] which is priced at US$895, which is quite a bit more expensive and not as interesting for hobbyists.

    Done right you may be able to do a lot of interesting stuff with a FPGA a lot faster than an ordinary processor can and then let the processor take care of stuff where performance isn't a critical part.

    Those chips are right now starting to find their way into vehicle ECUs [xilinx.com], but it's still in an early phase so there aren't many mass produced cars yet with it.

    As I see it - supercomputers will have to look at every avenue to get maximum performance for the lowest possible power consumption - and avoid solutions with high power consumption in standby situations.

  • Not this week....
    I am a fan boy for the small ARM boards... I have built an MPI cluster out of Raspberry-Pi boards and it is not even close except as a teaching exercise where it excels.

    However many site services can be dedicated to these little boards where corp IT seems to dedicate virtual machines.

    Department Web Servers... with mostly static content... via NFS or a revision control system like hg.
    Department and internal caching name servers... NTP servers and managed central storage for each bu

How many QA engineers does it take to screw in a lightbulb? 3: 1 to screw it in and 2 to say "I told you so" when it doesn't work.

Working...