Supercomputer Built With 8 GPUs 232
FnH writes "Researchers at the University of Antwerp in Belgium have created a new supercomputer with standard gaming hardware. The system uses four NVIDIA GeForce 9800 GX2 graphics cards, costs less than €4,000 to build, and delivers roughly the same performance as a supercomputer cluster consisting of hundreds of PCs. This new system is used by the ASTRA research group, part of the Vision Lab of the University of Antwerp, to develop new computational methods for tomography. The guys explain the eight NVIDIA GPUs deliver the same performance for their work as more than 300 Intel Core 2 Duo 2.4GHz processors. On a normal desktop PC their tomography tasks would take several weeks but on this NVIDIA-based supercomputer it only takes a couple of hours. The NVIDIA graphics cards do the job very efficiently and consume a lot less power than a supercomputer cluster."
I guess... (Score:4, Funny)
Re: (Score:2, Informative)
Re-birth of Amiga? (Score:5, Interesting)
Re:Re-birth of Amiga? (Score:5, Informative)
nVidia's CUDA framework for performing general purpose operations on a GPU is something totally different. I don't think the Amiga custom chips could be repurposed in such a fashion.
Re:Re-birth of Amiga? (Score:4, Interesting)
Re: (Score:2, Interesting)
Not really. The Amiga also had perfect synchronization between the different components. When you configured soundchip and graphics chip for a particular sample rate and screen resolution, you would know exactly how many samples would be played for the duration of one frame. And you had synchronization to the point where you could know which of the samples were played while a particular line was being sent through the
Re: (Score:2)
A funny twist on 3DFx' marketing campaing (Score:2)
Most of the TV spot started explaing how scientist could save humanity with GFLOPS-grade chips. But then humorously, the TV sport announces that they decided to play game (often with hilarious effect on the various "dreams of a better humanity" that the first half of the spot showed).
In a funny twist of things, it's the exact opposite that happened
Why haven't they started releasing GPU CPUs yet? (Score:3, Interesting)
Re:Why haven't they started releasing GPU CPUs yet (Score:5, Insightful)
Get the performance where it's most needed (Score:4, Insightful)
Precisely. But that happens to be one of the areas where more performance is still needed.
You don't need a super-duper CPU for text editing, that's for sure. For most of the tasks people do on computers, we have had CPU enough for the last 15 years or more. But where we still need more CPU happens to be mostly in tasks that ARE massively parallel, for instance, physics simulations, of which you will find several examples in the nVidia site [nvidia.com].
I'm following this technology with much interest, and I think I will have a major upgrade in my home computer soon. My old FX-5200 card has been more than enough for my gaming needs, but now I have a new reason for upgrading.
Re:Get the performance where it's most needed (Score:5, Funny)
You don't need a super-duper CPU for text editing
Re:Get the performance where it's most needed (Score:5, Funny)
There, I corrected that for you.
Re: (Score:3, Funny)
There, I corrected that for you.
Clearly YOU have never used EMACS!
Re:Get the performance where it's most needed (Score:2)
We're using core2 duos at my work to run Office 97 and a foxpro database app that's been warmed over once since 1996. I use the remaining 99% of CPU power for F@H and World Community Grid clients.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re:Why haven't they started releasing GPU CPUs yet (Score:5, Informative)
Also, this stuff isn't as mature as normal C programming, so issues that don't always exist in software that's distributed to the general public will crop up because not everyone's video card will support everything that's going on in the program.
Re: (Score:2)
Simple: This is not a supercomputer at all, just special-purpose hardware running a very special problem. For general computa
Re:Why haven't they started releasing GPU CPUs yet (Score:2)
In all likelihood, if they tried too hard to advertise their speed advantage, they w
Re:Why haven't they started releasing GPU CPUs yet (Score:2)
This article makes it seem like it is possible to use the GPUs as general purpose CPUs. Is that the case?
As well as the issues that others have mentioned, there's also the problem of accuracy with GPUs.
AFAIK, in many (all?) ordinary consumer graphics cards, minor mistakes by the GPU are tolerated because they'll typically result in (at worst) minor or unnoticable glitches in the display. I assume that this is because, to get the best performance, designers push the hardware beyond levels that would be acceptable otherwise.
Clearly if you're using them for other mathematical operations, or to partly replace
Intel is planning it (Score:2)
Intel on the other hand are touting their future Larrabee as being completely compatible with the x86 instruction set. The whole thing, according to them, should behave like a big many
Re:Why haven't they started releasing GPU CPUs yet (Score:2)
Re:The idea is to use the CPU as the CPU (Score:5, Insightful)
This is awesome! (Score:5, Funny)
I like this too:
The medical researchers ran some benchmarks and found that in some cases their 4000EUR desktop superPC outperforms CalcUA, a 256-node supercomputer with dual AMD Opteron 250 2.4GHz chips that cost the University of Antwerp 3.5 million euro in March 2005...
Re: (Score:2, Informative)
Re: (Score:3, Funny)
Re: (Score:2, Informative)
$6.218.
Re: (Score:2)
The FASTRA uses aircooling and with the sidepanel removed the GPUs run at 55 degrees Celsius in idle, 86 degrees Celsius under full load and 100 degrees Celsius under full load with the shaders 20% overclocked. They have to run the system with the left side panel removed as the graphics cards would otherwise overheat but they're looking for a solution for their heat problem.
Looking for a solution?
Geeks everyone have used the old "box fan aimed at the case" solution since time immemorial.
If you wanna get real fancy, you can pull/push air through a water cooled radiator.
Example: http://www.gmilburn.ca/ac/geoff_ac.html [gmilburn.ca]
Re: (Score:2)
Re: (Score:2, Insightful)
(the big shift over the last 6 years is mostly due to wanton printing of money in the US and rather tight central banking in Europe [with a healthy dose of Chinese currency rate fixing thrown in]. The trend isn't all that likely to continue, as a weakening dollar is great for American businesses operating in Europe and horrible for European businesses operating in America, which creates [increasing amounts of] counter-pressure to the relatively loose government policy in the US, or saying it t
Re: (Score:2, Insightful)
20th century thinking. Welcome to globalization. The product was designed, manufactured, and purchased on Earth.
Re: (Score:2)
Tomography (Score:5, Informative)
In other news Graphics cards are good at . . . graphics.
Re:Tomography (Score:5, Insightful)
And BTW even rendering the reconstructed results is not that simple, as current graphics card are optimized for geometry, not volumetric data.
Re: (Score:2, Informative)
Re: (Score:2, Informative)
In other news Graphics cards are good at . . . graphics.
It's not the graphics part that makes it so computer-intensive.... All the mathematics behind it, once that's done, the presentation could be done on any ol' computer....
So, if you mean by "graphics" that they are good at difficult geometrical calculations (like in games, for example), than you are right.... because that's what it is, truck-load of geometry...
From Wikipedia:
Tomography: "[...] Digital geometry processing is used to generate a three-dimensional image of the inside of an object from a l
Re: (Score:2)
Cheers ;)
coincidence (Score:2, Insightful)
I can't imagine that it is a coincidence that this comes along just as Nvidia are crowing about CUDA, or that the resulting machine looks like a gamer's dream rig.
While there is ample crossover between hardware enthusiasts and academia, anyone soley with the computation interest in mind probabyl wouldn't be selecting neon fans, aftermarket coolers or spend that much time on presentable wiring.
In other news... (Score:5, Funny)
Finally... (Score:5, Funny)
Re:Finally... (Score:5, Funny)
Re: (Score:3, Funny)
This is not a supercomputer (Score:4, Insightful)
Killer Slant (Score:2, Insightful)
Pardon the italics, but I was impacted by the killer slant of this posting.
For specific kinds of calculations, sure, GPGPU supercomputing is superior. I would question what software optimization they had applied to the 300 CPU system. Apparently, none. Let's not sensationalize quite so much, shall we?Not a Supercomputer -- Special purpose hardware (Score:3, Informative)
Re: (Score:3, Interesting)
Brick of GPUs (Score:5, Interesting)
Between the massive brick of GPUs and the massive CPU heatsink/fan, you can't see the mobo at all.
Re: (Score:3, Funny)
Monitor Height & Ergonomics (Score:2)
The way our neck bones are structured, makes looking up more strenuous than looking down. Hence, it is more comfortable to look downwards than upwards.
Wave of the Future? Yes (Score:5, Informative)
The GPGPU scheme is, after all, a re-invention of the vector processing of old. Vector processors died out, however, because there were too few users to support. Now that there's a commercially viable reason to make these processors (PS3 and video games), they are interesting again.
The researchers took a specialized piece of hardware, rewrote their code for it, and found it was faster than their original code on generic hardware. The problems here are that you have to rewrite your code (High Energy Physics codebases are about a GB, compiled... other sciences are similar) and you have to have a problem which will run well on this scheme. Have a discrete problem? Too bad. Have a gigantic, tightly coupled problem which requires lots of inter-GPU communication? Too bad.
Have a tomography problem which requires only 1GB of RAM? Here you go...
The standard supercomputer isn't going away for a long, long time. Now, as before, a one-size-fits-all approach is silly. You'll start to see sites complement their clusters and large-SMP machines with GPU power as scientists start to understand and take advantage of them. Just remember, there are 10-20 years of legacy code which will need to be ported... it's going to be a slow process.
Re:Wave of the Future? Yes (Score:5, Informative)
Re: (Score:3, Interesting)
Since when have "vector processors died out"? The "Earth Simulator" for example used the NEC SX-6 CPU, currently the SX-9 is sold. Vector processors never died out and were in use for what they are best at. The GPU and the Cell
Vector Computing (Score:2, Interesting)
Re: (Score:2)
Re: (Score:2)
The price ! (Score:3, Funny)
What's more crazy: calling something this inexpensive a supercomputer, or 4 video cards costing a freaking 4,000 EUR.
I wonder how that compares to the D870 (Score:2)
Sooo... using GPU for graphics? (Score:2)
Re: (Score:2)
Thanks a lot ATI
(PS: This is in Enlightenment not Compiz)
Have they profiled it? (Score:4, Interesting)
Has been done before with PS3s (Score:2, Interesting)
Re:By what benchmark? (Score:5, Informative)
Re:By what benchmark? (Score:5, Interesting)
Unfortunately, this setup won't work ideally for a lot of other CUDA based applications. For the past 6 months, I had a system with 6 GPUs (actual physical GPUs). This is the system that I showed at CES [ocia.net]. We are easily able to do 8 physical GPUs, and now I've been solely focused on utilizing Tesla.
Given that NVIDIA released the GX2 series, I was not surprised that someone would announce an 8GPU system. I'm surprised it took this long for someone to do it, and almost equally surprised that slashdot took this long to publish any news that is decent in the realm of GPU super computing. I've been cranking out close to 228 billion atom evals. per second in VMD [uiuc.edu] for months now, versus about 4 billion on dual quad core 3.0GHz Xeons.
Re: (Score:2)
I'm looking to CUDA-ize some algorithms (for computation as opposed to real-time graphics). Are there any books or sites that you found really helpful? (I have GPU Gems 3 on order.)
Re: (Score:3, Informative)
For VMD, it was necessary to have 1 CPU core per GPU. We tested 6 GPUs with 4 cores and we could only spawn 4 threads for GPU processing. The guys at Evolved Machines told me they can use multi GPU off of a single core. If so, I have no idea how. N
Re: (Score:2)
Re:By what benchmark? (Score:5, Insightful)
Re:By what benchmark? (Score:5, Informative)
Also, the "multiply" and "add" instructions exist in a "madd" opcode which essentially doubles the theoretical floating point performance, even if you don't use "madd" very often.
Re:By what benchmark? (Score:5, Informative)
Also, it's possible that loading floating points operands and storing results in actual code can be pipelined, while integer operations are not pipelined.
(and yes, I don't know what I'm talking about)
Re: (Score:3, Funny)
Re:By what benchmark? (Score:5, Informative)
While they can do integer, these machines are not very happy with it, and I found it much easier to do everything in floating point, even if you are talking about 8-bit colour data. It goes no slower, and everything is much better adapted to floating point. Then there are special instructions to get back to integer at the output.
While each operation takes 4 cycles, they are fully pipelined, so that it launched a new instruction per cycle, times 32 pipes per unit, times 8 units per GPU.
And madd is very useful for the sort of tasks for which supercomputers are traditionally used.
Re: (Score:3, Informative)
1. Aligning the two mantissas so the exponents match
2. Performing the operation
3. Renormalizing the mantissa of new value so that it is in the range 1.0 to less than 2.0
4. Saving the result to the destination register
Each of these stages would probably take one read/write cycle.
Performing an in
Re: (Score:3, Insightful)
Re:By what benchmark? (Score:5, Insightful)
Why general purpose? (Score:2)
Re:Limited Application (Score:5, Informative)
The idea is: the original code would run faster on a 8 Core2Duo machine than on the 8 GPUs. Even more optimising of the code will do little for the Core2Duos, due to limited memory bandwidth, FSB bandwidth, and so on.
Meanwhile, optimising a pipelining sistem (load, compute, store) in the GPU would be greatly improved by huge bandwidth (50GB/s on current systems), huge number of computation units (128 or more) and so on.
Re:Limited Application (Score:5, Insightful)
We Need a Universal Multicore Processor (Score:3, Interesting)
No doubt about it. In spite of my admittedly negative criticism, I applaud these guys because I think this shows the amazing potential of multicore parallel computing to bringing supercomputing power to the desktop and even to the laptop and the cellphone. However, this potential will not arrive unless we can find a w
Re:By what benchmark? (Score:5, Insightful)
By the benchmark that they solve the particular problem of this specific application in 1/300th of the time?
Re:By what benchmark? (Score:5, Insightful)
Re: (Score:2)
You may generalize that, like, e.g., in - 'for running VISTA', but (ymmv) of course you can come up with a more serious example.
CC.
Re:By what benchmark? (Score:5, Insightful)
And... a screwdriver is not always a prybar. A tool's a tool - they have preferred usage but if your requirement is specific and you're creative enough, you can do some fine work outside of the tool's intended purpose. Like this guy. Kudos to him.
Perhaps some more creative people finding this information will now discover if their specific requirements can be met by this interesting configuration. That will save them large quantities of cash or possibly enable some facility that was not previously available because supercomputers cost a grip-o-cash.
Of course for general purpose supercomputing you would want to use modified PS3s [wired.com].
Define: which is better? (Score:5, Informative)
Depends on what you're after! If you are trying to get yourself from point A to point B, the Lamborghini is the obvious choice. But if you need to move 4.5 tons of stuff from point A to point B, the Lamborghini would suck ass when compared to the flatbed truck.
It's just a question of what you are trying to accomplish. There is no absolute framework for "power" to solve problems, even if you define it fairly narrowly. For example, let's talk about 'pattern matching': A free database (like PostgreSQL) on cheap hardware can search through millions of records to deliver a query result in a tenth of a second. In that respect, Postgres is WAY faster than, say, the human brain. But the human brain will KICK ASS over just about any other technology out there in deciding whether or not a particular image contains a cat.
Use the right tool for the job, and you'll be amazed at the results. That 8 GPUs handily outperform 512 CPU cores at a specific task is not surprising - the GPUs are designed from the beginning to solve the kind of problem that's needed!
Personally, I'm surprised as to why there hasn't been more development behind the FPGA: are they just expensive?
Re: (Score:2)
Re: (Score:2)
I disagree -- I have a about a 4 in 10 success rate on rapidshares new "type the letters that have a cat behind them" captcha. Surely someone could write an app to do better.
Re: (Score:3, Insightful)
8 GPUs are being compared to 300 CPUs. So the single GPU for this pupose isn't 300 times as powerful as the CPU.
It is doing the operation in 1/37th the time approximately. This isn't news or unbelievable. GPUs are dedicated to performing certainly types of tasks far better than a CPU.
Re:By what benchmark? (Score:5, Informative)
Re: (Score:3, Interesting)
The other critical 40% of my project would have gained absolutely nothing from SIMD and on the Cell would have lost time due to branches. In this case 300
Re:By what benchmark? (Score:4, Interesting)
In order to utilize this "super computer", your problem has to be refactored in such a way that it can utilize the hardware efficiently. This can be either be fairly easy or incredibly difficult depending on the problem, tool-set available, etc. .
Their benchmark is good for them, but it is most likely meaningless to the general super-computing community. Porting something like LINPACK over and running that as a benchmark however would give a whole lot more insight into what kind of performance boost a typical scientific app might gain from said hardware.
Nice to see someone utilizing this functionality though.
~X~
Re:By what benchmark? (Score:5, Informative)
If the application requires solving a small task many times over and over and all of these tasks can be done in parallel then using a GPU works great because a GPU has many cores each of which can handle a simple routine. Also the GPU is designed to spend very little time on the way code is hadled (load, switch etc) and spend more time actually running the code (hence the requirement of only very simple functions).
Such problems frequently arise in tomography, physics, astronomy etc and I hear GPUs are a great success in these areas. But don't hold your breath for running your favorite distro blazingly fast using GPUs.
Re:By what benchmark? (Score:4, Informative)
I could easily believe that it performed comparably to 300 2.4GHz Core 2 Duos (aka 600 "over 1.5x faster but not vector-specialised" cores).
Theoretical performance is 576 GFLOPS per 9800 GX2 GPU (4.608 TFLOPS total) vs 19.2 GFLOPS per Core 2 CPU (5.760 TFLOPS total). However in tests the Core 2 gets as low as 6 GFLOPS instead of it's 19 theoretical, and the 9800 GPU gets a lot closer to it's full power.
Re: (Score:2)
Since they are using eight GPU's, the total memory of the system must be in the range of 8 Gigabytes. They would need half the memory for the raw image data, and the other half for the final cube volume (1024^3 x 16 bits).
From the video, a calculation which would normally take an
Re: (Score:3, Interesting)
GPUs on the other hand are far more parallel. The thousands of individual subprocessors ca
Re: (Score:2, Informative)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)