ARM In Supercomputers — 'Get Ready For the Change'

ARM In Supercomputers — 'Get Ready For the Change' 238

Posted by Soulskill on Saturday May 25, 2013 @11:10PM from the you-and-what-ARMy dept.

An anonymous reader writes "Commodity ARM CPUs are poised to to replace x86 CPUs in modern supercomputers just as commodity x86 CPUs replaced vector CPUs in early supercomputers. An analysis by the EU Mountblanc Project (PDF) (using Nvidia Tegra 2/3, Samsung Exynos 5 & Intel Core i7 CPUs) highlights the suitability and energy efficiency of ARM-based solutions. They finish off by saying, 'Current limitations [are] due to target market condition — not real technological challenges. ... A whole set of ARM server chips is coming — solving most of the limitations identified.'"

ARM In Supercomputers — 'Get Ready For the Change'

This discussion has been archived. No new comments can be posted.

Search 238 Comments Log In/Create an Account

Comments Filter:

Does it really matter? (Score:5, Interesting)

by gman003 ( 1693318 ) writes: on Saturday May 25, 2013 @11:31PM (#43825251)

Most of the actual processing power in current supercomputers comes from GPUs, not CPUs. There are exceptions (that all-SPARC Japanese one, or a few Cell-based ones), but they're just that, exceptions.
So sure, replace the Xeons and Opterons with Cortex-A15s. Doesn't really change much.
What might be interesting is a GPU-heavy SoC - some light CPU cores on the die of a supercomputer-class GPU. I have heard Nvidia is working on such (using Tegra CPUs and Tesla GPUs), and I would not be surprised if AMD is as well, although they'd be using one of their x86 cores for it (probably Bulldozer - damn thing was practically built for heavily-virtualized servers, not much different from supercomputers).

Re:IMHO - No thanks. (Score:2, Interesting)

by Anonymous Coward writes: on Saturday May 25, 2013 @11:33PM (#43825255)

architecture is complicated. but in terms of ops per mm^2, or ops per watt, ops per $,
cycles per useful op, the x86 architecture is a henious pox on the face of the
earth.
worse yet, your beloved x86 doesn't even have any source implications, its just
a useless thing.

Exactly. (Score:2, Interesting)

by Junta ( 36770 ) writes: on Saturday May 25, 2013 @11:58PM (#43825333)

This isn't to say that ARM *can't* be there, but thus far all of the implementations have focused around 'good enough' performance within a tightly constrained power envelope. Intel's designs have traditionally been highly inefficient in that power band, but at peak conditions, it is still compelling.
I recall one 'study' which claimed to demonstrate ARM as inarguably better. It got way more attention than they should have. The reason being is that they measured the performance on the ARM test, but just *assumed* TDP would be the accurate number for x86. There are very few workloads that would cause a processor to *average* TDP over the course of a benchmark.
The thing that really *is* stealing x86 thunder is the GPU world. Intel's Phi strives to answer it, but thus far falls short in performance. There continue to be areas where GPU architecture is an ill fit, and ultimately I think Phi may end up being a pretty good solution.

Re:Does it really matter? (Score:5, Interesting)

by symbolset ( 646467 ) * writes: on Sunday May 26, 2013 @12:06AM (#43825365) Journal

These ARM cores are halfway between the extremely limited GPU cores and the extremely flexible X86 cores. They may be the "happy medium".

Re:IMHO - No thanks. (Score:4, Interesting)

by dbIII ( 701233 ) writes: on Sunday May 26, 2013 @12:09AM (#43825375)

Then you use something else as well. High performance computing server rooms already have a mix of stuff, especially since the AMD chips can give you a 64 core machine with half a terabyte of memory for $14K but it's not as fast per core as the two way Xeons. The parallel stuff is done on the plentiful and slower cores while the single treaded stuff is done on the faster cores - then GPUs do whatever parallel stuff you can feed them (memory and bandwidth limiting issues keep them from doing some tasks)

Re:IMHO - No thanks. (Score:5, Interesting)

by KiloByte ( 825081 ) writes: on Sunday May 26, 2013 @12:25AM (#43825409)

Damage or a winner? I feel so bad about having a cheap, efficient, and above all, quiet box.
I bought this [hardkernel.com] 4*2GHz baby, and the only reason it's not my main desktop yet is a weird and asinine requirement for monitor resolution to be exactly 720 or 1080 (WTF?!?). I think I'll replace my old but perfectly working pair of 1280x1024 monitors (I hate 16x9!), and put the big loud clunker to the cellar. I just hate the noise so much. x86 machines with no moving parts are extremely hard to get, and have terrible performance/price. Anything that requires lots of processing power: compilation, running Windows VMs, etc, can be done remotely from the cellar just as well, while a 2GHz arm is fast enough to do client stuff, running a browser being the most demanding part.
And what else do you need to reside directly on the machine you plop your butt at?

Questions... (Score:5, Interesting)

by storkus ( 179708 ) writes: on Sunday May 26, 2013 @12:26AM (#43825411)

As I understand it, Intel still has the advantage in the performance per watt category for general processing and GPUs have better performance per watt IF you can optimize for that specific environment--both things which have been commented to death endlessly by people far more knowledgeable than I.
However, to me there are at least 3 questions unanswered:
1. ASICs (and possibly FPGAs): Bitcoin miners and DES breakers are the best known examples. Where is the dividing line between where your operations are specific enough to emply an ASIC vs not specific enough and needing a GPU (or even CPU)? Could further optimization move this line more toward the ASIC?
2. Huge dies: This has been talked about before, but it seems that, for applications that are embarrassingly parallel, this is clearly where the next revolution will be, with hundreds of cores (at least, and of whatever kind of "core" you want). So when will this stop being vaporware?
3. But what do we do about all the NON-parallel jobs? If you can't apply an ASIC and you can't break it down, you're still stuck at the basic wall we've been at for around a decade now: where's Moore's (performance) law here? It would seem the only hope is new algorithms: TRUE computer science!

Re:IMHO - No thanks. (Score:5, Interesting)

by symbolset ( 646467 ) * writes: on Sunday May 26, 2013 @01:14AM (#43825595) Journal

The problem you have is the software tools you use sap the power of the hardware. Windows is engineered to consume cycles to drive their need for recurrent license fees. Try a different OS that doesn't have this handicap and you'll find the full power of the equipment is available.

Re:IMHO - No thanks. (Score:4, Interesting)

by BasilBrush ( 643681 ) writes: on Sunday May 26, 2013 @06:31AM (#43826429)

Why would an ARM chip use 2 Watts?
â-- ARM Cortex-A9
â-- 1 ops / cycle @ 800 MHz - 2 GHz
â-- 0.25 - 1 Watt
â-- ARM Cortex-A15
â-- 4 ops / cycle @ 1 - 2.5 GHz*
â-- 0.35 Watt

Re:That's what is so funny to me (Score:4, Interesting)

by TopSpin ( 753 ) writes: on Sunday May 26, 2013 @09:31AM (#43826949) Journal

There's no magic ju ju in ARM designs.
The magic ju ju is the ARM business model. There is one trump card ARM holds that precludes Intel from many portable devices; chip makers can build custom SOCs in-house with whatever special circuits they want on the same die. Intel doesn't do that and they don't want to do it; it would mean licencing masks to other manufactures like ARM does. For example, the Apple A5, manufactured by Samsung, includes third party circuits like the Audience EarSmart noise-cancellation processor, among others. It is presently not feasible to imagine Intel handing over masks such that Apple could then contract with some foundry to manufacture custom x86 SOCs. This obviates Intel from many portable use cases.
That feature of the ARM business model might be very useful to large scale computing. One can imagine integrating a custom high-performance crossbar with an ARM core. Cores on separate dies could then communicate with the lowest possible latency. Using a general purpose ARM core to marshal data to and from high-performance SIMD circuits on the same die is another obvious possibility. A custom cryptography circuit might be hosted the same way.
Contemporary supercomputers are great aggregations of near-commodity components. However, supercomputing has a long history of custom circuit design and if the need arises for a highly specialized circuit then a designer may decide that integrating with ARM to do the less exotic leg work computing that is always necessary is a good choice.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

ARM In Supercomputers — 'Get Ready For the Change' 238

ARM In Supercomputers — 'Get Ready For the Change' More Login

ARM In Supercomputers — 'Get Ready For the Change'

Does it really matter? (Score:5, Interesting)

Re:IMHO - No thanks. (Score:2, Interesting)

Exactly. (Score:2, Interesting)

Re:Does it really matter? (Score:5, Interesting)

Re:IMHO - No thanks. (Score:4, Interesting)

Re:IMHO - No thanks. (Score:5, Interesting)

Questions... (Score:5, Interesting)

Re:IMHO - No thanks. (Score:5, Interesting)

Re:IMHO - No thanks. (Score:4, Interesting)

Re:That's what is so funny to me (Score:4, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot