Catch up on stories from the past week (and beyond) at the Slashdot story archive


Forgot your password?
For the out-of-band Slashdot experience (mostly headlines), follow us on Twitter, or Facebook. ×
Supercomputing Hardware

FASTRA II Puts 13 GPUs In a Desktop Supercomputer 127 127

An anonymous reader writes "Last year tomography researchers of the ASTRA group at the University of Antwerp developed a desktop supercomputer with four NVIDIA GeForce 9800 GX2 graphics cards. The performance of the FASTRA GPGPU system was amazing; it was slightly faster than the university's 512-core supercomputer and cost less than 4000EUR. Today the researchers announce FASTRA II, a new 6000EUR GPGPU computing beast with six dual-GPU NVIDIA GeForce GTX 295 graphics cards and one GeForce GTX 275. The development of the new system was more complicated and there are still some stability issues, but tests reveal the 13 GPUs deliver 3.75x more performance than the old system. For the tomography reconstruction calculations these researchers need to do, the compact FASTRA II is four times faster than the university's supercomputer cluster, while being roughly 300 times more energy efficient."
This discussion has been archived. No new comments can be posted.

FASTRA II Puts 13 GPUs In a Desktop Supercomputer

Comments Filter:
  • by Chirs (87576) on Wednesday December 16, 2009 @07:26PM (#30466356) the article?

    The motherboard is a ASUS P6T7 WS Supercomputer.

  • by jandrese (485) <> on Wednesday December 16, 2009 @07:37PM (#30466500) Homepage Journal
    If you read the article it tells you that the supercomputer has 256 Opteron 250s (2.4Ghz) and was built 3 years ago. If you have a parallizable problem that can be solved with CUDA, you can get absolutely incredible performance out of off-of-the-shelf GPUs these days.
  • Re:GPU accuracy (Score:5, Informative)

    by kpesler (982707) on Wednesday December 16, 2009 @07:48PM (#30466606)
    Presently the G200 GPUs in this machine support double-precision, but at about 1/8 the peak rate of single-precision. In practice, since most codes tend to be bandwidth limited, and pointer arithmetic is the same for single and double precision, double-precision performance is usually closer to 1/2 that of single-precision performance, but not always. With the Fermi GPUs to be released early next year, double-precision peak FLOPS will be 1/2 of single-precision peak, just like on present X86 processors. Also note that many scientific research groups, such as my own, have found that contrary to dogma, single-precision is good enough for most of the computation, and that a judicious mix of single and double-precision arithmetic gives high-performance with sufficient accuracy. This is true for some, but not all, computational methods.
  • by jstults (1406161) on Wednesday December 16, 2009 @08:08PM (#30466846) Homepage

    you can get absolutely incredible performance out of off-of-the-shelf GPUs these days.

    I had heard this from folks, but didn't really buy it until I read this paper [] today. They get a speed-up (wall clock) using the GPU even though they have to go to a worse algorithm (Jacobi instead of SSOR). Pretty amazing.

  • Re:Silly (Score:3, Informative)

    by modemboy (233342) on Wednesday December 16, 2009 @08:22PM (#30466998)

    The difference between GeForce and Quadro cards is almost always completely driver based, it is the exact same hw, different sw.
    This basically a roll your own Tesla, and considering the Teslas connect to the host system via an 8x or 16x PCI-e add in card, I'm gonna say you are wrong when it comes to the bandwidth issue as well...

  • by Sycraft-fu (314770) on Wednesday December 16, 2009 @09:08PM (#30467434)

    Because it only applies to the kind of problems that CUDA is good at solving. Now while there are plenty of those, there are plenty that it isn't good for. Take a problem that is all 64-bit integer math and has a branch every couple hundred instructions and GPUs will do for crap on it. However a supercomputer with general purpose CPUs will do as well on it as basically anything else.

    That's why I find these comparisons stupid. "Oh this is so much faster than our supercomputer!" No it isn't. It is so much faster for some things. Now if you are doing those things wonderful, please use GPUs. However don't then try to pretend you have a "supercomputer in a desktop." You don't. You have a specialized computer with a bunch of single precision stream processors. That's great so long as your problem is 32-bit fp, highly parallel, doesn't branch much, and fits within the memory on a GPU. However not all problems are hence they are NOT a general replacement for a supercomputer.

  • by cheesybagel (670288) on Wednesday December 16, 2009 @09:15PM (#30467514)

    Really? Care to share any results that support that? I'm quite sure the peak flops you can achieve on the GPU are much higher than the limited SIMD capability of the CPU.

    IIRC they claim 2.5-3x times more performance using a Tesla than using the CPUs in their workstation. Ignoring load time.

    SSE enables a theoretical peak performance enhancement of 4x for SIMD amenable codes (e.g. you can do 4 parallel adds using vector SSE, in the time it takes to make 1 add using scalar SSE). In practice however you usually get like 3x more performance.

    Theoretical SIMD performance for the GPU is very fine and nice, but in practice the small caches in current GPUs limit performance. CPUs also often have out-of-order execution support and other hardware which is too expensive in terms of transistors to implement in a GPU.

    IMO the main problem here is that the programming model for the CPU is too complex since you need to use several different ways to express parallelism (SIMD/Multicore/Cluster) to get top performance.

  • Re:Silly (Score:3, Informative)

    by jpmorgan (517966) on Wednesday December 16, 2009 @09:16PM (#30467536) Homepage

    The hardware is the same, but the quality control is different. Teslas and Quadros are held to rigorous standards. GeForces have an acceptable error rate. That's fine for gaming, but falls flat in scientific computing.

"Even if you're on the right track, you'll get run over if you just sit there." -- Will Rogers