Forgot your password?
typodupeerror
Power Hardware

Why 'Gaming' Chips Are Moving Into the Server Room 137

Posted by timothy
from the expense-report-manipulation-++ dept.
Esther Schindler writes "After several years of trying, graphics processing units (GPUs) are beginning to win over the major server vendors. Dell and IBM are the first tier-one server vendors to adopt GPUs as server processors for high-performance computing (HPC). Here's a high level view of the hardware change and what it might mean to your data center. (Hint: faster servers.) The article also addresses what it takes to write software for GPUs: 'Adopting GPU computing is not a drop-in task. You can't just add a few boards and let the processors do the rest, as when you add more CPUs. Some programming work has to be done, and it's not something that can be accomplished with a few libraries and lines of code.'"
This discussion has been archived. No new comments can be posted.

Why 'Gaming' Chips Are Moving Into the Server Room

Comments Filter:
  • by TwiztidK (1723954) on Thursday July 15, 2010 @03:42PM (#32918288)
    I've heard that many programmers have issues coding for 2 and 4 core processors. I'd like to see how they'll addapt to running "run hundreds of threads" in parallel.
  • by tedgyz (515156) * on Thursday July 15, 2010 @03:47PM (#32918370) Homepage

    This is a long-standing issue. If your programs don't just "magically" run faster, then count out 90% or more of the programs that will benefit from this.

  • by morcego (260031) on Thursday July 15, 2010 @03:50PM (#32918424)

    This is just like programing for a computer cluster ... after a fashion.

    Anyone used to do both should have no problem with this.

    I'm anything but a high end programmer (I mostly only code for myself), and I have written plenty of code that runs with 7-10 threads. Believe me, when you change the way you think about how an algorithm works, it doesn't matter if you are using 3 or 10000 processors.

  • by Austerity Empowers (669817) on Thursday July 15, 2010 @03:50PM (#32918436)

    CUDA or OpenCL is how they do it.

  • by Sax Maniac (88550) on Thursday July 15, 2010 @03:51PM (#32918462) Homepage Journal

    This isn't hundreds of threads that can run arbitrary code paths like a CPU, you have to totally redesign your code, or already have implemented parallel code so that you already run a number of threads that all do the same thing at the same time, just on different data.

    The threads all run in lockstep, as in, all the threads better be at the same PC at the same time. If you run into a branch in the code, then you lose your parallelism, as the divergent threads are frozen until they come back together.

    I'm not a big thread programmer, but I do work on threading tools. Most of the problems with threads seems to come with threads doing totally different code paths, and the unpredictable scheduling interactions that arise between them. GPU coding a lot more tightly controlled.

  • Libraries (Score:2, Insightful)

    by Dynetrekk (1607735) on Thursday July 15, 2010 @03:59PM (#32918584)
    I'm really interested in using GPGPU for my physics calculations. But you know - I don't want to learn Nvidia's low-level, proprietary (whateveritis) in order to do an addition or multiplication, which may or may not outperform the CPU version. What would be _really_ great is stuff like porting the standard "low-level numerics" libraries to the GPU: BLAS, LAPACK, FFTs, special functions, and whatnot - the building blocks for most numerical programs. LAPACK+BLAS you already get in multicore versions, and there's no extra work on my part to use all cores on my PC. Please, computer geeks (i.e. more computer geek than myself), let me have the same on the GPU. When that happens, we can all buy Nvidia HotShit gaming cards and get research done. Until then, GPGPU is for the superdupergeeks.
  • by Nadaka (224565) on Thursday July 15, 2010 @04:04PM (#32918682)

    No it isn't. That you think so just shows how much you still have left to learn.

    I am not a high end programmer either. But I have two degrees on the subject and have been working professionally in the field for years, including optimization and parallelization.

    Many algorithms just won't have much improvement with multi-threading.

    Many will even perform more poorly due to data contention and the overhead of context switches and creating threads.

    Many algorithms just can not be converted to a format that will work within the restrictions of GPGPU computing at all.

    The stream architecture of modern GPU's work radically differently than a conventional CPU.

    It is not as simple as scaling conventional multi-threading up to thousands of threads.

    Certain things that you are used to doing on a normal processor have an insane cost in GPU hardware.

    For instance, the if statement. Until recently OpenCL and CUDA didn't allow branching. Now they do, but they incur such a huge penalty in cycles that it just isn't worth it.

  • by Dynetrekk (1607735) on Thursday July 15, 2010 @04:08PM (#32918728)

    Believe me, when you change the way you think about how an algorithm works, it doesn't matter if you are using 3 or 10000 processors.

    Have you ever read up on Amdahl's law? [wikipedia.org]

  • by Anonymous Coward on Thursday July 15, 2010 @04:45PM (#32919168)

    Well, GPGPU actually in a way addresses the memory bandwidth. Mostly due to design limitations, each GPU comes with their own memory, and thus memory bus and bandwidth.
    Of course you can get that for CPUs as well (with new Intels or any non-ancient AMD) by going to multiple sockets, however that is more effort and costlier (6 PCIe slots - unusual but obtainable - and you can have 12 GPUs, each with their own bus, try getting a 12-socket motherboard...).

  • by Fulcrum of Evil (560260) on Thursday July 15, 2010 @06:47PM (#32920646)

    most post secondaries are now teaching students how to properly thread for parallel programming.

    No they aren't. Even grad courses are no substitute for doing it. Never mind that parallel processing is a different animal than SIMD-like models that most GPUs use.

    I haven't had to deal with any of it myself, but I imagine it'll boil down to knowing what calculations in your program can be done simultaneously, and then setting up a way to dump it off onto the next available core.

    No, it's not like that. you set up a warp of threads running the same code on different data and structure it for minimal branching. That's the thumbnail sketch - nvidia has some good tutorials on the subject and you can use your current GPU.

  • by bored (40072) on Thursday July 15, 2010 @11:13PM (#32922734)

    I've done a little CUDA programming, and I've yet to find significant speedups doing it. Every single time, some limitation in the arch keeps it from running well. My last little project, ran about 30x faster on the GPU than the CPU, the only problem was that the overhead of getting it to the GPU + computation + overhead of getting it back, was roughly equal to the time it took to just dedicate a CPU.

    I was really excited about AES on the GPU too, until it turned out to be about 5% faster than my CPU.

    Now if the GPU was designed more as a proper coprocessor (ala early x87, or early Weitek) and integrated into the memory hierarchy better (put the funky texture ram and such off to the side) some of my problems might go away.

If God is perfect, why did He create discontinuous functions?

Working...