ARM In Supercomputers — 'Get Ready For the Change' 238
An anonymous reader writes "Commodity ARM CPUs are poised to to replace x86 CPUs in modern supercomputers just as commodity x86 CPUs replaced vector CPUs in early supercomputers. An analysis by the EU Mountblanc Project (PDF) (using Nvidia Tegra 2/3, Samsung Exynos 5 & Intel Core i7 CPUs) highlights the suitability and energy efficiency of ARM-based solutions. They finish off by saying, 'Current limitations [are] due to target market condition — not real technological challenges. ... A whole set of ARM server chips is coming — solving most of the limitations identified.'"
Re:Does it really matter? (Score:5, Informative)
Re:IMHO - No thanks. (Score:5, Informative)
Re:IMHO - No thanks. (Score:5, Informative)
architecture is complicated. but in terms of ops per mm^2, or ops per watt, ops per $,
cycles per useful op, the x86 architecture is a henious pox on the face of the
earth.
worse yet, your beloved x86 doesn't even have any source implications, its just
a useless thing.
In TFA's slides 10 and 11, Intel i7 chips are shown to be more efficient in terms of performance per watt than ARM chips. However, they're close to each other and Intel's prices are significantly higher.
Re:Does it really matter? (Score:5, Informative)
Of the last published top500 list, 7 out of the top 10 had no GPUs. This is a clear indication that while GPU is defintely there, claiming 'Most of the actual processing power' is overstating it a touch. It's particularly telling that there are so few as overwhelming the specific hpl benchmark is one of the key benefits of GPUs. Other benchmarks in more well rounded test suites don't treat GPUs so kindly.
Re:Does it really matter? (Score:5, Informative)
Also, a lot of algorithms, perhaps even most, rely on branching, which is something GPUs suck at. And only some can be reasonably rewritten in a branchless way.
Re:Not buying it. (Score:4, Informative)
I don't buy your response: http://top500.org/statistics/list/ [top500.org] ... click accelerator and hit submit.
87.6% of the top 500 super computers have no NVIDIA etc. coprocessing
No, they won't. (Score:5, Informative)
Current ARM processors may indeed have a role to play in supercomputing, but the advantages this article implies don't exist.
Go look at performance figures for the Cortex-A15. It's *much* faster than the Cortex-A9. It also draws far more power. There's a reason why ARM's own product literature identifies the Cortex-A15 as a smartphone chip at the high end, but suggests strategies like big.LITTLE for lowering total power consumption. Next year, ARM's Cortex-A57 will start to appear. That'll be a 64-bit chip, it'll be faster than the Cortex-A15, it'll incorporate some further power efficiency improvements, and it'll use more power at peak load.
That doesn't mean ARM chips are bad -- it means that when it comes to semiconductors and the laws of physics, there are no magic bullets and no such thing as a free lunch.
http://www.extremetech.com/computing/155941-supercomputing-director-bets-2000-that-we-wont-have-exascale-computing-by-2020 [extremetech.com]
I'm the author of that story, but I'm discussing a presentation given by one of the US's top supercomputing people. Pay particular attention to this graph:
http://www.extremetech.com/wp-content/uploads/2013/05/CostPerFlop.png [extremetech.com]
What it shows is the cost, in energy, of moving data. Keeping data local is essential to keeping power consumption down in a supercomputing environment. That means that smaller, less-efficient cores are a bad fit for environments in which data has to be synchronized across tens of thousands of cores and hundreds of nodes. Now, can you build ARM cores that have higher single-threaded efficiency? Absolutely, yes. But they use more power.
ARM is going to go into datacenters and supercomputers, but it has no magic powers that guarantee it better outcomes.
Xilinx Zync anybody? (Score:5, Informative)
Has anybody else seen/considered the Xilinx Zync [xilinx.com]? It's a mix of ARM kernels and FPGA, which could be interesting in supercomputing solutions.
For anyone willing to tweak around with it there are development boards around like the ZedBoard [zedboard.org] that is priced at US$395. Not the cheapest device around, but for anyone willing to learn more about this interesting chip it is at least not an impossible sum. Xilinx also have the Zynq®-7000 AP SoC ZC702 Evaluation Kit [xilinx.com] which is priced at US$895, which is quite a bit more expensive and not as interesting for hobbyists.
Done right you may be able to do a lot of interesting stuff with a FPGA a lot faster than an ordinary processor can and then let the processor take care of stuff where performance isn't a critical part.
Those chips are right now starting to find their way into vehicle ECUs [xilinx.com], but it's still in an early phase so there aren't many mass produced cars yet with it.
As I see it - supercomputers will have to look at every avenue to get maximum performance for the lowest possible power consumption - and avoid solutions with high power consumption in standby situations.
One Size Doesn't Fit All -- Same in Supercomputing (Score:5, Informative)
There is already one line of supercomputers built from embedded hardware: the IBM Blue Gene. Their CPUs are embedded PowerPC [wikipedia.org] cores. That's the reason why those systems typically have an order of magnitude more cores than their x86-based competition.
Now, the problem with BG is, that not all codes scale well with the number of cores. Especially when you're doing strong scaling (i.e. you fix the problem size, but throw more and more cores on the problem), then the law of Amdahl [wikipedia.org] tells you that it's beneficial to have fewer/faster cores.
Finally I consider the study to be fundamentally flawed as it compares the OEM prices of consumer-grade embedded chips with retail prices of high-end server chips. This is wrong for so many reasons... you might then throw in the 947 GFLOPS, $500 AMD Radeon 7970 [wikipedia.org], which beats even the ARM SoCs by a margin of 2x (ARM: ~1 GFLOPS/$, AMD Radeon: ~2 GFLOPS/$).
Re:IMHO - No thanks. (Score:4, Informative)
Given, my experiences are pretty dated, and things have gotten better... for me, linux is on the server(s) or in a virtual machine... every time I've tried to make it my primary OS has been met with heartache and pain. I replaced my main desktop a couple months ago, and tried a few Linux variants.. The first time, I installed on my SSD, then when I plugged in my other hard drives, it still booted, but an update to Grub screwed things up and it wouldn't boot any longer. This was after 3 hours of time to get my displays working properly.... I wasn't willing to spend another day on the issue, so back to Windows I went. I really like Linux.. and I want to make it my primary desktop, but I don't have extra hours and days to tinker with problems an over-the-wire update causes... let alone the initial setup time which I really felt was unreasonable.
I've considered putting it as my primary on my macbook, but similar to windows, the environment pretty much works out of the box, and brew takes things a long way towards how I want it to work. Linux is close to 20 years old.. and still seems to be more crusty for desktop users than windows was a decade and a half ago in a lot of ways. In the end, I think Android may be a better desktop interface than what's currently on offer from most of the desktop bases in the Linux community, which is just plain sad... I really hope something good comes out of it all, I don't like being tethered to Windows or OSX... I don't like the constraints... but they work, with far fewer issues... the biggest ones being security related... I think that Windows is getting secure faster than Linux is getting friendlier, or at least easier to get up and running with.
Re:IMHO - No thanks. (Score:4, Informative)
what do you think goes on at the other end of the copper/fibre cable?
No supercomputing whatsoever. I'm not a physicist, a mathematician, a code breaker nor anyone else with supercomputing needs. My HTTP request for web page is quite likely served by a single core. Maybe 2.