Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Supercomputing Hardware

FASTRA II Puts 13 GPUs In a Desktop Supercomputer 127

An anonymous reader writes "Last year tomography researchers of the ASTRA group at the University of Antwerp developed a desktop supercomputer with four NVIDIA GeForce 9800 GX2 graphics cards. The performance of the FASTRA GPGPU system was amazing; it was slightly faster than the university's 512-core supercomputer and cost less than 4000EUR. Today the researchers announce FASTRA II, a new 6000EUR GPGPU computing beast with six dual-GPU NVIDIA GeForce GTX 295 graphics cards and one GeForce GTX 275. The development of the new system was more complicated and there are still some stability issues, but tests reveal the 13 GPUs deliver 3.75x more performance than the old system. For the tomography reconstruction calculations these researchers need to do, the compact FASTRA II is four times faster than the university's supercomputer cluster, while being roughly 300 times more energy efficient."
This discussion has been archived. No new comments can be posted.

FASTRA II Puts 13 GPUs In a Desktop Supercomputer

Comments Filter:
  • Awesome (Score:5, Funny)

    by enderjsv ( 1128541 ) on Wednesday December 16, 2009 @06:13PM (#30466190)
    Almost meets the minimum requirements for Crysis 2
    • by copponex ( 13876 )

      This was post #2 and already modded -1, Redundant.

      • Must be redundant in the long run.
        This is sad, since this one was clever.
        • Re: (Score:3, Funny)

          by joocemann ( 1273720 )

          slashdot mods are often, as I observe, sour and pissy skeptics. even if it is humorous to them they will knock it for lack of something else to bash.

          • Re: (Score:3, Funny)

            by joocemann ( 1273720 )

            slashdot mods are often, as I observe, sour and pissy skeptics. even if it is humorous to them they will knock it for lack of something else to bash.

            -1 troll

            lol. exactly

            • I've found that lately on Slashdot, I agree with them that highly moderated humorous posts seem to far outnumber the interesting ones. I've actually ratcheted down all funny comments to -4 or -5, and browse at 2, to catch the more interesting discussions which get passed over. But I've never seen any reason to moderate them down now that we have control when logged in... I dunno, maybe others think that people who come here looking for facetious comments should have to browse at funny +5 instead of us sourp
            • by mgblst ( 80109 )

              Oh yeah, the fact that we get exactly the same comment everytime a fast computer + GPU is mentioned shouldn't stop the next moron from posting it.

              • Oh yeah, the fact that we get exactly the same comment everytime a fast computer + GPU is mentioned shouldn't stop the next moron from posting it.

                Its so horrible. Oh god they must pay. MOD THEM DOWN! MOD THEM DOWN!

                hows those lemons?

      • It's redundant because some smartass mentions Crysis in response to *every fucking article* about someone doing something using powerful GPUs*.

        Of course, if it was about CPUs, the post would be about what will be needed to run Windows 8, or 'finally meeting the minimum system requirements for Vista'.

        Mostly, you can predict these posts from the title of the article. Doesn't stop crotchety people people like me coming to complain about it though...

        * Footnote: When someone equally-crotchety complained about t

        • Sorry to reply to myself, but I've just noticed two comments:

          mgblast's [slashdot.org], which says the same thing as mine more succinctly and bluntly. Embarrassingly, it was in the same god-damn thread.

          Further down, we have this comment [slashdot.org] by RandomUsr, who actually does mention Vista. Woo! In fact, he (and the person that responded to him) also mentions anitvirus software. Never mind that this is a GPGPU system, just post crap about *something* bloated and wait for the '+1 Funny' mods to roll in.

          Gods, reading these two pos

          • Gods, reading these two posts made me realise that I need to stop reading and posting to Slashdot when it's late and I'm in a bad mood and feeling misanthropic. *grumbles*

            Ya think? Honestly I'm not on Slashdot very often. Maybe it is an overused joke, but I don't post a lot in hardware related forums so how would I know. Check my history if you don't believe me. If you think it's overused, not funny or redundant, do what I do when I come across such posts. Roll your eyes, then move onto the next post. Don't get your panties in a twist over it. I'm sorry if not every post on Slashdot conforms to your standards, but if you're that worried about it, go start your own f

            • Whoa, whoa. Chill, we're all friends. Don't take it personally, my comments weren't directed solely at you. It isn't a big deal (frankly, are any posts on Slashdot a 'big deal'?), so there's no need to make a mountain out of it. I don't expect it's you that malevolently thinks "Aha! Another hardware article! Just what I need to get a sardonic, lambasting response from BertieBaggio." Equally, I don't look for these jokes just so I can grumble about how people always post them. Generally, the only systematic

    • Re:Awesome (Score:4, Funny)

      by sadness203 ( 1539377 ) on Wednesday December 16, 2009 @06:29PM (#30466390)
      Only if you imagine a beowolf cluster of these
      Here goes the redundant and offtopic mod.
      • by toby ( 759 ) *

        Au contraire, I clicked the article link JUST to find this comment. Thankyou for maintaining a cherished /. tradition!

      • by Firehed ( 942385 )

        And in Soviet Russia, 13 GPUs supercompute using you!

        (Is that the smell of burning Karma?)

    • by TejWC ( 758299 )

      Sadly it doesn't. Why? Because it appears to be running Linux [dvhardware.net].

  • News Flash (Score:2, Funny)

    by RandomUsr ( 985972 )
    Blazing Fast Pron Machine running Windows Vista. Don't forget to pick a copy of the latest memory intensive Anti-Virus, as this machine will handle it just fine.
  • by Ziekheid ( 1427027 ) on Wednesday December 16, 2009 @06:31PM (#30466422)
    "the compact FASTRA II is four times faster than the university's supercomputer cluster, while consuming 300 times less power" And the original supercomputer was how fast? 512 cores doesn't say THAT much. I could compare my computer to supercomputers from the past and they'd say the performance of my system was amazing too.
    • by jandrese ( 485 ) <kensama@vt.edu> on Wednesday December 16, 2009 @06:37PM (#30466500) Homepage Journal
      If you read the article it tells you that the supercomputer has 256 Opteron 250s (2.4Ghz) and was built 3 years ago. If you have a parallizable problem that can be solved with CUDA, you can get absolutely incredible performance out of off-of-the-shelf GPUs these days.
      • Re: (Score:2, Interesting)

        by Ziekheid ( 1427027 )
        I'll admit that, thanks for the info, you'd think this was crucial information for the summary too though. Everything put in perspective, it will only outperform the cluster on specific calculations so overall it's not faster right?
        • Re: (Score:3, Interesting)

          by raftpeople ( 844215 )
          It's all a continuum and depends on the problem. For problems with enough parallelism that the GPU's are a good choice, then they are faster. For a completely serial problem, then the current fastest single core is faster than the both the supercomputer and the GPU's.
      • Re: (Score:2, Informative)

        by jstults ( 1406161 )

        you can get absolutely incredible performance out of off-of-the-shelf GPUs these days.

        I had heard this from folks, but didn't really buy it until I read this paper [nasa.gov] today. They get a speed-up (wall clock) using the GPU even though they have to go to a worse algorithm (Jacobi instead of SSOR). Pretty amazing.

        • At least a CPU program, when it crashes, does not bring down the whole OS. Memory protection? Pah, who needs such things... After all you never make coding mistakes. Right?

          It is like MS-DOS programming all over again. Except the computer takes longer to reboot.

          They use a worse algorithmic complexity algorithm in the paper because it actually performs better in the GPU than the other one. This happens in CPUs in several cases as well. When was the last time you saw someone using a Fibonacci heap? Memory

          • Well, I'm not sure about most of your criticisms, but they use Jacobi instead of Gauss-Seidel because SSOR is not data parallel, but Jacobi is.

            That would make the performance the same as for the GPU system.

            Really? Care to share any results that support that? I'm quite sure the peak flops you can achieve on the GPU are much higher than the limited SIMD capability of the CPU.

            Note that I am being generous here and actually ignoring the program setup time when they need to copy the data to the GPU.

            Sure there's communications overhead, but that's true of any parallel processing problem, the trick is to find problems that have a big computation to communication ratio (which happens to be m

            • Re: (Score:3, Informative)

              by cheesybagel ( 670288 )

              Really? Care to share any results that support that? I'm quite sure the peak flops you can achieve on the GPU are much higher than the limited SIMD capability of the CPU.

              IIRC they claim 2.5-3x times more performance using a Tesla than using the CPUs in their workstation. Ignoring load time.

              SSE enables a theoretical peak performance enhancement of 4x for SIMD amenable codes (e.g. you can do 4 parallel adds using vector SSE, in the time it takes to make 1 add using scalar SSE). In practice however you u

              • IIRC they claim 2.5-3x times more performance using a Tesla than using the CPUs in their workstation. Ignoring load time.

                Their CPU numbers almost certainly take SIMD into account.

                I'm doing cryptography research, and some of my colleagues have been considering building a similar "desktop supercomputer". The speedup there looks more reasonable: a single high-end GPU should be worth maybe 5-10 quad-core CPUs; it costs double and uses double the power, but it's easier to put a dozen of them in a single PC. Th

      • by Sycraft-fu ( 314770 ) on Wednesday December 16, 2009 @08:08PM (#30467434)

        Because it only applies to the kind of problems that CUDA is good at solving. Now while there are plenty of those, there are plenty that it isn't good for. Take a problem that is all 64-bit integer math and has a branch every couple hundred instructions and GPUs will do for crap on it. However a supercomputer with general purpose CPUs will do as well on it as basically anything else.

        That's why I find these comparisons stupid. "Oh this is so much faster than our supercomputer!" No it isn't. It is so much faster for some things. Now if you are doing those things wonderful, please use GPUs. However don't then try to pretend you have a "supercomputer in a desktop." You don't. You have a specialized computer with a bunch of single precision stream processors. That's great so long as your problem is 32-bit fp, highly parallel, doesn't branch much, and fits within the memory on a GPU. However not all problems are hence they are NOT a general replacement for a supercomputer.

        • Take a problem that is all 64-bit integer math and has a branch every couple hundred instructions and GPUs will do for crap on it.

          So would a Cray; supercomputers and GPUs are made for the same sorts of problems (exploiting data parallelism). Now if by 'supercomputer' you mean 'a cluster of commodity hardware', then ok, you've got a point, that heap of cpus will handle branches plenty fast.

          • Except that a 'supercomputers' and a 'cluster of commodity hardware' are effectively synonymous these days. They all use the same Power/Xeon/Opteron/Itanium chips, with several cores and a several GB of memory to a compute node. The only real difference left is the interconnect. Commercially built systems tend to have far beefier and more complex interconnects. Homebrew systems more often than not just use gigabit ethernet, with the larger ones rarely using anything better than a 'fat tree' with channel
            • by Retric ( 704075 )

              There are also a fair number of Cell based supercomputers and even one hybrid out there. And even some pure custom solutions used by the NSA. (There is a reason they have their own chip fab.) And, if you include folding at home type applications, then GPU's represent a reasonable percentage of the worlds supper computing infrastructure.

              • Aside from a few homebrew PS3 clusters, I don't know of any large scale Cell installations. The Roadrunner is a fairly standard (if very large) Opteron based cluster, with PowerXCell co-processors. The latest Cray XT5 is is a fairly standard (if very large) Opteron based cluster, with PowerXCell or FPGA co-processors.

                The NSAs ASIC systems don't count, by definition, they are not general purpose. A modern 3GHz quad-core processor will manage an exhaustive DES search in about 600 years. Deep crack in 1998 c

        • by timeOday ( 582209 ) on Wednesday December 16, 2009 @11:59PM (#30469260)

          Take a problem that is all 64-bit integer math and has a branch every couple hundred instructions and GPUs will do for crap on it. However a supercomputer with general purpose CPUs will do as well on it as basically anything else.

          That was always true of supercomputers. In fact the stuff that runs well on CUDA now is almost precisely the same stuff that ran well on Cray vector machines - the classic stereotype of "Supercomputer"! Thus I do not see your point. The best computer for any particular task will always be one specialized for that task, and thus compromised for other tasks.

          BTW, newer GPUs support double precision [herikstad.net].

        • Re: (Score:1, Insightful)

          by Anonymous Coward

          E X A C T L Y ! ! ! I always read about how fast the Cell Broadband Processor(tm) is and how anyone is a FOOL for not using it. No. They suck hard when it comes to branch prediction. Their memory access is limited to fast, but very small memory. Out of branch execution performance is awful. You have to rewrite code massively to avoid it. For embarassingly parallel problems, they are a dream. For problems not parallel, they are quite slow. An old supercomputer isn't as fast as a new one. If ordin

        • by mcrbids ( 148650 )

          That's why I find these comparisons stupid. "Oh this is so much faster than our supercomputer!" No it isn't. It is so much faster for some things. Now if you are doing those things wonderful, please use GPUs. However don't then try to pretend you have a "supercomputer in a desktop." You don't. You have a specialized computer with a bunch of single precision stream processors. That's great so long as your problem is 32-bit fp, highly parallel, doesn't branch much, and fits within the memory on a GPU. However not all problems are hence they are NOT a general replacement for a supercomputer.

          For that matter, which is faster: A two-ton flatbed truck, or a Maserati? Kinda depends on what you are trying to do, doesn't it? Want to move 3,000 pounds of Hay? You probably DON'T want the Maserati!

          And all machines are like this. Some machines are better at some tasks than others. And presumably, the comparison to the University Supercomputer was because of a task that they *needed* to perform, and the pittance cost of the GPGPU-based supercomputer favored very well against the cost of leasing University supercomputer time.

          Even different people are better at some things than others.... Some people are better a maths than others. Some people can take a bit of vinegar and coffee grounds, and make an artistic masterpiece.

          Because I'm a jogger, I can run long distances faster than most people. But I suck at sprints, and I take long showers. I type over 100 WPM.

          See?

        • by Kjella ( 173770 )

          Sure,,but if you look at it from their perspective - before we needed time on a supercomputer and now we don't. Either you redefine supercomputers to include that or it's another task where we don't need one, even better if you ask me. So it doesn't do everything, well running an embarrassingly parallel problem on a supercomputer would also "terrible" performance now compared to this.

          That's great so long as your problem is 32-bit fp, highly parallel, doesn't branch much, and fits within the memory on a GPU.

          As far as I know the Teslas will be doing double precision, and we certainly could put GPUs on a better backplane for GPU-GPU

  • Why does the computer from Swordfish [toplessrobot.com]?

    Get Animated [geekonwheels.com]
    *Drools*
  • times less (Score:4, Funny)

    by Tubal-Cain ( 1289912 ) on Wednesday December 16, 2009 @06:37PM (#30466510) Journal

    ...consuming 300 times less power.

    *sigh*

  • I've got a pair of 9800gx2 in my rig. The cards turn room temperature air into ~46C air. Without proper ventilation, these things will turn a chassis into an easy bake oven.

    For those not familiar with the 9800gx2 cards, it essentially is two 8800gts video cards linked together to act as a single card - something called SLI on the NVidia side of marketing. SLI typically required a mainboard/chipset that would allow you to plug in two cards and link them together. This model allowed any mainboard to have two 'internal' cards linked together, with the option of linking another 9800gx2 if your board actually supported SLI.

    The pictures did not show any SLI bridge, so it looks like they are just taking advantage of multiple GPUs per card.

    • The pictures did not show any SLI bridge, so it looks like they are just taking advantage of multiple GPUs per card.

      There's no seven-way SLI anyway. Since the GPUs are being used for processing and not graphics, there's no need for them to work together via SLI or Crossfire or what have you as long as the OS and programs treat 'em like any other multiprocessor setup.

    • Re: (Score:3, Funny)

      I've got a pair of 9800gx2 in my rig. The cards turn room temperature air into ~46C air. Without proper ventilation, these things will turn a chassis into an easy bake oven.

      That's a brilliant idea, now people can make snacks without ever leaving the computer.

  • Yeah but... (Score:1, Redundant)

    by definate ( 876684 )

    Can it play Crysis with a high frame rate on maximum?

  • Duh! Look at the number of GPU's...13...try 12 or 14 and your luck will change.
    • by selven ( 1556643 )

      It's not even that hard. Just number them starting from 0 so the last one is only 12. Then when you add another make it 14. Problem solved.

  • The guy in the video on that page looks exactly like the stereotype of the guy I'd expect to do this sort of thing.
  • This isn't a huge achievement. Nobody else has done it because it's silly.

    There are two major reasons... the first is they use GeForce cards. That's not a good idea, since GeForces are held to much lower quality standards than Teslas and Quadros. They're intended for gaming graphics, where a minor error here or there isn't the end of the world. "Sorry we missed your cancer, since our supercomputer miscalculated that region of the reconstruction." The second problem is, that's one bandwidth starved machine.

    • Re: (Score:3, Informative)

      by modemboy ( 233342 )

      The difference between GeForce and Quadro cards is almost always completely driver based, it is the exact same hw, different sw.
      This basically a roll your own Tesla, and considering the Teslas connect to the host system via an 8x or 16x PCI-e add in card, I'm gonna say you are wrong when it comes to the bandwidth issue as well...

      • Re: (Score:3, Informative)

        by jpmorgan ( 517966 )

        The hardware is the same, but the quality control is different. Teslas and Quadros are held to rigorous standards. GeForces have an acceptable error rate. That's fine for gaming, but falls flat in scientific computing.

        • by DeKO ( 671377 )

          Uh... no, you are wrong. Quadros and GeForces have a lot of differences in the internal hardware. Just because they "do the same thing" (they draw triangles really, really fast) it doesn't mean they are the same. GeForces, for example, don't have optimizations for drawing points and lines, nor assume you are abusing of obsolete APIs, like immediate mode drawing; both are common in CAD applications, and almost useless in games.

          • No, the chips are almost exactly the same (except Quadros have 100% unbroken chips). You're thinking driver differences.

          • by Khyber ( 864651 )

            There is NO difference between Quadro and GeForce besides Geforce basically being a laser-locked defective quadro with a different firmware.

            In fact, you can flash most GeForce cards with the equivalent Quadro firmware and in some applications (not gaming) get better performance.

            Been tooling around with nVidia cards since NV4. They've pretty much used this same strategy for the past decade+.

    • Re: (Score:2, Insightful)

      by CityZen ( 464761 )

      It's not silly: (1) this is a research project, not production medical equipment, meaning that the funds to buy Tesla cards were probably not available, and they aren't particularly worried about occasional bit errors. (2) Their particular application doesn't need much inter-GPU communication, if any, so that bandwidth is not an issue. They just need for each GPU to load datasets, chew on them, and spit out the results.

      How much does your proposed GPU supercomputer cost for 13 GPUs?

    • There are two major reasons... the first is they use GeForce cards. That's not a good idea, since GeForces are held to much lower quality standards than Teslas and Quadros.

      Tell that to the Quadro FX1500M that was in my HP/Compaq "professional workstation" laptop, that had a well-known die bonding problem that caused overheat failures across an entire production line. Neither HP nor nVidia recalled the defective parts and I ended up spending literally days on the phone with HP support before they sent me a new laptop. Higher quality, my ass. Quadro chips are marked differently, period the end.

  • Folding@home enthusiasts and academic contributors did more than that, and a long time ago, too. Just check this thread at foldingforums [foldingforum.org] for one example.

  • Wouldn't it be nice if the FASTRA II, which is 3.75 times faster than the FASTRA I, was actually called the FASTRA 375. Then I wouldn't have to ask.

    • by slew ( 2918 )

      If it's really 3.75 times faster maybe they could call it the FASTRA System 360 Model 96 (or the Fastra 360/96) for short ;^)

  • The Brady Bunch called they want their set clothes back!
  • it was slightly faster than the university's 512-core supercomputer and cost less than 4000EUR.

    but tests reveal the 13 GPUs deliver 3.75x more performance than the old system.

    It is impossible, to make such general statements about the performance, for something that is still very much specialized on long pipelines and streams of repetitive data (vector processing).

    They may be much faster for tasks that fit that scheme. But slower for those that don’t.

    • by ceoyoyo ( 59147 )

      The performance of a standard cluster, or even a SIMD machine will vary tremendously depending on your application as well. The only reasonable way is to pick a problem and compare performance on that problem.

      They just forgot a phrase at the end of that statement: "it was slightly faster than the university's 512-core supercomputer... in this application."

  • Apparently, the regular BIOS can't boot with more than 5? graphics cards installed due to the amount of resources (memory & I/O space) that each one requires. So the researchers asked ASUS to make a special BIOS for them which doesn't set up the graphics card resources. However, the BIOS still needs to initialize at least one video card, so they agreed that the boot video card would be the one with only a single GPU. Presumably, they could have also chosen a dual GPU card that happened to be differen

  • Maybe there's a really good reason for it that I'm not fully aware of, but why are PC cases, motherboards, add-on cards etc. all seen to be designed around such limited amounts of space? Is there a such thing a s PC case that size of a mini-fridge or bigger? A motherboard with freaking 10 or 12 slots with enough space between them? A video card the size of a motherboard? Anything but a cramped little box with limited expansion? Is that such a bizarre thing to want?
    • by CityZen ( 464761 )

      It's known as "market forces". In case you haven't noticed, the computing needs of most people can be crammed into something the size of a paperback book or so. Larger computing devices are available, but the bigger you go, the smaller the market, and thus the larger the price. If you want something big, you might take look at a computer named "Jaguar". It has a big price, too.

      As far as personal computers go, they tend to be designed around CPU strengths & limitations. Intel and AMD have figured ou

    • by CityZen ( 464761 )

      Oh, and by the way, I'm wondering quite the opposite: why do we still see so many over-sized full ATX size cases being offered, when microATX motherboards have everything we (most of us) need? Indeed, even mini-ITX motherboards are often adequate for so many needs, and yet mini-ITX cases still seem to command a premium because they are relatively rare. It's easy (and boring) to design a big rectangular ATX box. It's an engineering challenge to make a good-looking small box that does everything you need a

    • by CityZen ( 464761 )

      Oh, and here's your mini-fridge size case with 10 slots:

      http://www.mountainmods.com/computer-cases-c-21.html [mountainmods.com]

  • My lab will soon be building a computer or cluster for bioinformatics. Would something like this be appropriate / scaleable for gene microarray analysis, clustering algorithm tasks, etc? We need the capability to work with datasets in the 400 GB range, and with many permutations, but the specific datapoints are not large. Any suggestions or input would be much appreciated...
  • Would be nice if it actually worked. It's not much good having the fastest desktop computer in the world if it isn't stable. Or are they using the Dilbert definition of a PC upgrade ?

    Next time, make the fancy video when it's finished guys.

Never test for an error condition you don't know how to handle. -- Steinbach

Working...