Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

Why 'Gaming' Chips Are Moving Into the Server Room 137

Posted by timothy on Thursday July 15, 2010 @03:37PM from the expense-report-manipulation-++ dept.

Esther Schindler writes "After several years of trying, graphics processing units (GPUs) are beginning to win over the major server vendors. Dell and IBM are the first tier-one server vendors to adopt GPUs as server processors for high-performance computing (HPC). Here's a high level view of the hardware change and what it might mean to your data center. (Hint: faster servers.) The article also addresses what it takes to write software for GPUs: 'Adopting GPU computing is not a drop-in task. You can't just add a few boards and let the processors do the rest, as when you add more CPUs. Some programming work has to be done, and it's not something that can be accomplished with a few libraries and lines of code.'"

This discussion has been archived. No new comments can be posted.

Why 'Gaming' Chips Are Moving Into the Server Room

Load All Comments

Search 137 Comments Log In/Create an Account

Comments Filter:

A whole new level of parallelism (Score:5, Insightful)

by TwiztidK ( 1723954 ) writes: on Thursday July 15, 2010 @03:42PM (#32918288)

I've heard that many programmers have issues coding for 2 and 4 core processors. I'd like to see how they'll addapt to running "run hundreds of threads" in parallel.

Share
twitter facebook
- Re:A whole new level of parallelism (Score:4, Insightful)
  
  by morcego ( 260031 ) writes: on Thursday July 15, 2010 @03:50PM (#32918424)
  
  This is just like programing for a computer cluster ... after a fashion.
  Anyone used to do both should have no problem with this.
  I'm anything but a high end programmer (I mostly only code for myself), and I have written plenty of code that runs with 7-10 threads. Believe me, when you change the way you think about how an algorithm works, it doesn't matter if you are using 3 or 10000 processors.
  
  Parent Share
  twitter facebook
  - Re:A whole new level of parallelism (Score:5, Insightful)
    
    by Nadaka ( 224565 ) writes: on Thursday July 15, 2010 @04:04PM (#32918682)
    
    No it isn't. That you think so just shows how much you still have left to learn.
    I am not a high end programmer either. But I have two degrees on the subject and have been working professionally in the field for years, including optimization and parallelization.
    Many algorithms just won't have much improvement with multi-threading.
    Many will even perform more poorly due to data contention and the overhead of context switches and creating threads.
    Many algorithms just can not be converted to a format that will work within the restrictions of GPGPU computing at all.
    The stream architecture of modern GPU's work radically differently than a conventional CPU.
    It is not as simple as scaling conventional multi-threading up to thousands of threads.
    Certain things that you are used to doing on a normal processor have an insane cost in GPU hardware.
    For instance, the if statement. Until recently OpenCL and CUDA didn't allow branching. Now they do, but they incur such a huge penalty in cycles that it just isn't worth it.
    
    Parent Share
    twitter facebook
    - Re: (Score:1, Informative)
      
      by Anonymous Coward writes:
      
      Uh
      OpenCL and CUDA supported branching from day one (with a performance hit). Before they existed, there was some (very little) usage of GPUs for general purpose computing and they used GLSL/HLSL/Cg, which supported branching poorly or not at all.
      The tools that were recently added to CUDA (for the latest GPUs) are recursion and function pointers.
      - Re: (Score:3, Informative)
        
        by sarkeizen ( 106737 ) writes:
        
        Personally (and I love that someone below mentioned Ahmdals law [wikipedia.org]). The problem isn't as you said about specific language constructs but that there isn't any general solution to parallelism. That is to use Brook's [amazon.com] illustration, problems we try to solve with computers aren't like harvesting wheat - they aren't efficiently divisible to an arbitrary degree. We do know of a few problems like this which we call "embarassingly parallel" [wikipedia.org] but these are few and far between. So GPU's are great MD5 crackers, protein
    - Re: (Score:2)
      
      by Twinbee ( 767046 ) writes:
      
      Are If branches only slow because of what someone said below:
      "If you run into a branch in the code, then you lose your parallelism, as the divergent threads are frozen until they come back together."
      Because if that's the case, that's fine by me. The worst case length that a thread can run can be defined and even low in some cases I know of.
    - Re: (Score:2)
      
      by morcego ( 260031 ) writes:
      
      Nadaka, you are just proving my statement there.
      What you are describing are people using the wrong kind of logic and algorithms to do parallelization.
      The only new statement you make is:
      Many algorithms just can not be converted to a format that will work within the restrictions of GPGPU computing at all.
      I will take your word for it, since I really don't know GPGPUs at all. Most of my experience with parallelism is with clusters (up to 30 nodes). On that scenario, 99% of the time I've heard someone say someth
    - There are also easy problems (Score:2)
      
      by dbIII ( 701233 ) writes:
      
      Many algorithms just won't have much improvement with multi-threading.
      Yes, but there are also many that will. I work with geophysicists, and a lot of what they do really involves applying the same filter to 25 million or so audio traces. Such tasks get split arbitrarily over clusters at any point of those millions of traces. One thread per trace is certainly possible because that's how it works normally anyway as independent operations in series. Once you get to output the results some theoretical 25 mi
    - Re:A whole new level of parallelism (Score:4, Interesting)
      
      by David Greene ( 463 ) writes: on Friday July 16, 2010 @01:09AM (#32923228)
      
      The stream architecture of modern GPU's work radically differently than a conventional CPU.
      True if the comparison is to a commodity scalar CPU.
      It is not as simple as scaling conventional multi-threading up to thousands of threads.
      True. Many algorithms will not map well to the architecture. However, many others will map extremely well. Many scientific codes have been tuned over the decades to exploit high degrees of parallelism. Often the small data sets are the primary bottleneck. Strong scaling is hard, weak scaling is relatively easy.
      Certain things that you are used to doing on a normal processor have an insane cost in GPU hardware.
      In a sense. These are not scalar CPUs and traditional scalar optimization, while important, won't utilize the machine well. I can't think of any particular operation that's greatly slower then on a conventional CPU, provided one uses the programming model correctly (and some codes don't map well to that model).
      For instance, the if statement.
      No. Branching works perfectly fine if you program the GPU as a vector machine. The reason branches within a warp (using NVIDIA terminology) are expensive is simply because a warp is really a vector. The GPU vendors just don't want to tell you that because either they fear being tied to some perceived historical baggage with that term or they want to convince you they're doing something really new. GPUs are interesting, but they're really just threaded vector processors. Don't misunderstand me, though, it's a quite interesting architecture to work with!
      
      Parent Share
      twitter facebook
  - Re:A whole new level of parallelism (Score:5, Insightful)
    
    by Dynetrekk ( 1607735 ) writes: on Thursday July 15, 2010 @04:08PM (#32918728)
    
    Believe me, when you change the way you think about how an algorithm works, it doesn't matter if you are using 3 or 10000 processors.
    Have you ever read up on Amdahl's law? [wikipedia.org]
    
    Parent Share
    twitter facebook
    - Re: (Score:2, Interesting)
      
      by Lord of Hyphens ( 975895 ) writes:
      
      Have you ever read up on Amdahl's law? [wikipedia.org]
      I'll see your Amdahl's Law, and raise you Gustafson's Law [wikipedia.org].
  - Re:A whole new level of parallelism (Score:4, Funny)
    
    by pushing-robot ( 1037830 ) writes: on Thursday July 15, 2010 @04:20PM (#32918858)
    
    Microsoft must be doing a bang-up job then, because when I'm in Windows it doesn't matter if I'm using 3 or 10000 processors.
    
    Parent Share
    twitter facebook
  - Re: (Score:2, Interesting)
    
    by Anonymous Coward writes:
    
    You might find this [youtube.com] Google Tech Talk interesting..
- Re: (Score:3, Insightful)
  
  by Austerity Empowers ( 669817 ) writes:
  
  CUDA or OpenCL is how they do it.
- Re:A whole new level of parallelism (Score:4, Insightful)
  
  by Sax Maniac ( 88550 ) writes: on Thursday July 15, 2010 @03:51PM (#32918462) Homepage Journal
  
  This isn't hundreds of threads that can run arbitrary code paths like a CPU, you have to totally redesign your code, or already have implemented parallel code so that you already run a number of threads that all do the same thing at the same time, just on different data.
  The threads all run in lockstep, as in, all the threads better be at the same PC at the same time. If you run into a branch in the code, then you lose your parallelism, as the divergent threads are frozen until they come back together.
  I'm not a big thread programmer, but I do work on threading tools. Most of the problems with threads seems to come with threads doing totally different code paths, and the unpredictable scheduling interactions that arise between them. GPU coding a lot more tightly controlled.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by Monkeedude1212 ( 1560403 ) writes:
  
  I've heard that many programmers have issues coding for 2 and 4 core processors.
  Or even multiple processors, for that matter.
  That in and of itself is almost an entirely new section of programming - if you were an Ace 15 years ago, your C++ skills might still be sharper than most new graduates, but most post secondaries are now teaching students how to properly thread for parallel programming. If you don't know how to code for 2 or 4 core processors, you really should jump on board. Almost every computer and laptop I can think of being sold brand new today has more than 1 core or proces
  - Re:A whole new level of parallelism (Score:4, Interesting)
    
    by jgagnon ( 1663075 ) writes: on Thursday July 15, 2010 @04:23PM (#32918880)
    
    The problem with "programming for multiple cores/CPUs/threads" is that it is done in very different ways between languages, operating systems, and APIs. There is no such thing as a "standard for multi-thread programming". All the variants share some concepts in common but their implementations are mostly very different from each other. No amount of schooling can fully prepare you for this diversity.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by Miseph ( 979059 ) writes:
      
      Isn't that basically true of everything else in coding too? You wouldn't code something in C++ for Linux the same way that you would code it in Java for Windows, even though a lot of it might be similar.
      Is parallelization supposed to be different?
      - Re: (Score:2)
        
        by Jeremy Erwin ( 2054 ) writes:
        
        Java for Windows? I think you might be missing the point.
    - Re: (Score:2)
      
      by nxtw ( 866177 ) writes:
      
      The problem with "programming for multiple cores/CPUs/threads" is that it is done in very different ways between languages, operating systems, and APIs.
      Really?
      Most modern operating systems implement POSIX threads, or are close enough that POSIX threads can be implemented on top of a native threading mechanism. The concept of independently scheduled threads with a shared memory space can only be implemented in so many ways, and when someone understands these concepts well, everything looks rather similar.
      It
    - Re: (Score:2)
      
      by pseudorand ( 603231 ) writes:
      
      Parallel programming is a bit different, but so is event-drive (Windows, JS) vs. procedural, and programmers do both of those fine. The problem, unfortunately, isn't that we're all too stupid to pick up multi-threaded programming, but the hardware isn't yet useful enough to make it worth the trouble. Take CUDA for example. To take advantage of the GPU you first have to copy data from main memory into GPU memory, do your parallel processing, then copy data back to main memory. Even for algorithms that are pa
  - Re: (Score:3, Insightful)
    
    by Fulcrum of Evil ( 560260 ) writes:
    
    most post secondaries are now teaching students how to properly thread for parallel programming.
    No they aren't. Even grad courses are no substitute for doing it. Never mind that parallel processing is a different animal than SIMD-like models that most GPUs use.
    I haven't had to deal with any of it myself, but I imagine it'll boil down to knowing what calculations in your program can be done simultaneously, and then setting up a way to dump it off onto the next available core.
    No, it's not like that. you set up a warp of threads running the same code on different data and structure it for minimal branching. That's the thumbnail sketch - nvidia has some good tutorials on the subject and you can use your current GPU.
- Re:A whole new level of parallelism (Score:5, Informative)
  
  by Chris Burke ( 6130 ) writes: on Thursday July 15, 2010 @04:33PM (#32919006) Homepage
  
  Programmers of Server applications are already used to multithreading, and they've been able to make good use of systems with large numbers of processors on them even before the advent of virtualization.
  But don't pay too much attention to the word "Server". Yes the machines that they're talking about are in the segment of the market referred to as "servers", as distinct from "desktops" or "mobile". But the target of GPU-based computing isn't "Servers" in the sense of the tasks you normally think of -- web servers, database servers, etc.
  The real target is mentioned in the article, and it's HPC, aka scientific computing. Normal server apps are integer code, and depend more on high memory bandwidth and I/O, which GPGPU doesn't really address. HPC wants that stuff too, but they also want floating point performance. As much floating point math performance as you can possibly give them. And GPUs are way beyond what CPUs can provide in that regard. Plus a lot of HPC applications are easier to parallelize than even the traditional server codes, though not all fall in the "embarrassingly parallel" category.
  There will be a few growing pains, but once APIs get straightened out and programmers get used to it (which shouldn't take too long for the ones writing HPC code), this is going to be a huge win for scientific computing.
  
  Parent Share
  twitter facebook
  - Re: (Score:1, Insightful)
    
    by Anonymous Coward writes:
    
    Well, GPGPU actually in a way addresses the memory bandwidth. Mostly due to design limitations, each GPU comes with their own memory, and thus memory bus and bandwidth.
    Of course you can get that for CPUs as well (with new Intels or any non-ancient AMD) by going to multiple sockets, however that is more effort and costlier (6 PCIe slots - unusual but obtainable - and you can have 12 GPUs, each with their own bus, try getting a 12-socket motherboard...).
  - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
  - Re: (Score:3, Interesting)
    
    by psilambda ( 1857088 ) writes:
    
    The article and everybody else are ignoring one large, valid use of GPUs in the data center--whether you call it business intelligence or OLAP--it needs to be in the data center and it needs some serious number crunching. There is not as much difference between this and scientific number crunching as most people might think. I have been involved in both crunching numbers for financials at a major multinational and had the privilege of being the first to process the first full genome (complete genetic sequ
    - Re: (Score:2)
      
      by inKubus ( 199753 ) writes:
      
      I was thinking the same thing. OLAP is all about manipulating big 2d and 3d sets, blending them with other sets, etc. All things GPUs have ops for on the die. Not that there aren't already relational db accelerator chips in the mainframe arena (such as the zIIP [ibm.com]). Obviously the drivers and front end needs to be remade to make the programming make sense, like OpenDL (data language) instead of OpenGL.
  - Re: (Score:2)
    
    by jlar ( 584848 ) writes:
    
    "There will be a few growing pains, but once APIs get straightened out and programmers get used to it (which shouldn't take too long for the ones writing HPC code), this is going to be a huge win for scientific computing."
    I am working on HPC (numerical modelling). At our institute we are seriously considering using GPU's for our next generation development. From my viewpoint the biggest problem is that the abstraction layers on top of the GPU's are not widely implemented and that they are somewhat more comp
  - - Re: (Score:2)
      
      by Chris Burke ( 6130 ) writes:
      
      All you say makes sense, but I for one don't understand the market for this. Today, if you need a compute server that's good for stream (e.g. SIMD) workloads you get a dozen 1U/2U rackmounts and fill them up with as many GPU boards as they'll take.
      Well I was mostly just trying to justify why the transition to the situation "today" is taking place, and why the multi-threading itself isn't that big a deal. "Yesterday" the biggest compute servers were still made from traditional CPUs. Only recently has the p
  - - Re: (Score:2)
      
      by Chris Burke ( 6130 ) writes:
      
      You were saying?
      I was saying that graphics cards don't address the memory throughput issue, and they don't, because what matters is not how fast you can access the on-board memory, but rather how fast you can stream data to that on-board memory (because even 2GB is only a fraction of overall system memory in these systems, and sever apps in particular tend not to be streaming but more random access), and that's fast but still slightly slower than CPU DRAM itself.
      Specs don't tell you everything. You have to
- Re: (Score:3, Informative)
  
  by Hodapp ( 1175021 ) writes:
  
  I am one such programmer. Yet I also coded for an Nvidia Tesla C1060 board and found it much more straightforward to handle several thousand threads at once.
  Not all types of threads are created equal. I usually explain CUDA to people as the "Zerg Rush" model of computing - instead of a couple, well-behaved, intelligent threads that try to be polite to each other and clean up their own messes, you throw a horde of a thousand little vicious, stupid threads at the problem all at once, and rely on some overlord
- Re: (Score:2, Interesting)
  
  by Anonymous Coward writes:
  
  I've heard that many programmers have issues coding for 2 and 4 core processors. I'd like to see how they'll addapt to running "run hundreds of threads" in parallel.
  
  If that's the paradigm they're operating in, it will probably fail spectacularly. Let me explain why.
  In the end, GPU's are essentially vector processors [wikipedia.org] (yes, I know that's not exactly how they work internally, but bear with me). You feed them one or more input vectors of data and one or two storage vectors for output and they do the same c
Good luck with that (Score:4, Insightful)

by tedgyz ( 515156 ) * writes: on Thursday July 15, 2010 @03:47PM (#32918370) Homepage

This is a long-standing issue. If your programs don't just "magically" run faster, then count out 90% or more of the programs that will benefit from this.

Share
twitter facebook
- Re: (Score:2)
  
  by crafty.munchkin ( 1220528 ) writes:
  
  Can anyone provide any info on how this is going to work with regard to virtual environments? After all, there has been a rather large push toward virtualizing everything in the datacenter, and about the only physical server we have left in our server is the Fax/SMS server, the ISDN card and GSM module for which could not be virtualised...
  - Re: (Score:2)
    
    by tedgyz ( 515156 ) * writes:
    
    Anyone worried enough about performance to adopt GPGPU computing is probably not going to virtualize.
    We have virtualized a good portion of our servers, but the critical ones, like our db servers are still good old fashioned iron.
    Personally, I hate all this virtualization. The people that run these things think is the second coming of Christ. If you try to point out flaws in their "amazing" virtual cluster, they always claim nothing is wrong.
Yes, of course (Score:2, Funny)

by Anonymous Coward writes:

The sysdamins need new machines with powerful GPUs, you know, for business purposes.
Oh and, they sell ERP software on Steam now, too, so we'll have to install that as well.
- Re:Yes, of course (Score:5, Funny)
  
  by Yvan256 ( 722131 ) writes: on Thursday July 15, 2010 @03:54PM (#32918494) Homepage Journal
  
  Portal 2? It's something for our Web server. It adds more portals to access the internet.
  
  Parent Share
  twitter facebook
CUDA (Score:4, Informative)

by Lord Ender ( 156273 ) writes: on Thursday July 15, 2010 @03:48PM (#32918402) Homepage

I was interested in CUDA until I learned that even the simplest of "hello world" apps is still quite complex and quite low-level.
NVidia needs to make the APIs and tools for CUDA programming simpler and more accessible, with solid support for higher-level languages. Once that happens, we could see adoption skyrocket.

Share
twitter facebook
- Re:CUDA (Score:5, Interesting)
  
  by Rockoon ( 1252108 ) writes: on Thursday July 15, 2010 @04:01PM (#32918630)
  
  Indeed. With Cuda, DirectCompute, and OpenCL, nearly 100% of your code is boilerplate interfacing to the API.
  
  There needs to be a language where this stuff is a first-class citizen and not just something provided by an API.
  
  Parent Share
  twitter facebook
  - Re: (Score:1)
    
    by jpate ( 1356395 ) writes:
    
    Actors are a really good framework (with a few [akkasource.org] different [scala-lang.org] implementations [javaworld.com]) for easy parallelization. Scala has an implementation of Actors as part of the standard library, so they really are first-class citizens.
  - Re: (Score:2)
    
    by 0100010001010011 ( 652467 ) writes:
    
    You mean like C/Objective-C and Grand Central Dispatch [wikipedia.org]?
    It's open source and has been ported to work with FreeBSD and Apache.
    Doesn't care if it's a CPU, GPU, 10xGPUs etc.
    - Re: (Score:2)
      
      by Rockoon ( 1252108 ) writes:
      
      No.. thats not the same thing. Even if GCD worked with GPU's (which I see no evidence of) it still wouldnt be the same thing. While GPU's often have many "threads", each thread itself is a very wide SIMD architecture. For GCD in its current form to be useful, the work() function would still have to have the SIMD stuff baked in.
      - Re: (Score:2, Informative)
        
        by BitZtream ( 692029 ) writes:
        
        GCD combined with OpenCL makes it usable on a GPU, but that would be stupid. GPUs aren't really 'threaded' in any context that someone who hasn't worked with them would think of.
        All the threads run simultaneously, and side by side. They all start at the same time and they all end at the same time in a batch (not entirely true, but it is if you want to actually get any boost out of it).
        GCD is multithreading on a General Processing Unit, like your Intel CoreWhateverThisWeek processor. Code paths are ran an
        
        Re: (Score:2)
        
        by Rockoon ( 1252108 ) writes:
        
        I really dont think that you understand that GPU's actually can, and do, execute more than one unique thread at a time. They could not get the polygon counts they do if they didnt.
        
        They arent just big SIMD's like you think. They really do execute independent threads and each are wide SIMD (128 bytes wide on most modern GPU's) .. if GCD backed by OpenCL can't do this, then its selling you short.
  - Re: (Score:2, Informative)
    
    by psilambda ( 1857088 ) writes:
    
    Indeed. With Cuda, DirectCompute, and OpenCL, nearly 100% of your code is boilerplate interfacing to the API. There needs to be a language where this stuff is a first-class citizen and not just something provided by an API.
    If you use CUDA, OpenCL or DirectComputeX it is--try the Kappa library--it has its own scheduling language that make this much easier. The next version that is about to come out goes much further yet.
- Re: (Score:2)
  
  by Austerity Empowers ( 669817 ) writes:
  
  Probably can't happen, the parallel computing model is very different than the model you use in applications today. It's still evolving, but I doubt you will ever be in a position where you can write code as you do now and have it use and benefit from GPU hardware out of the gates.
  - Re: (Score:2)
    
    by jgtg32a ( 1173373 ) writes:
    
    ever?
    - Re: (Score:1)
      
      by Dekker3D ( 989692 ) writes:
      
      Yes. Just like we still doubt that anybody ever should need more than 640K.
      - Re: (Score:2)
        
        by Bigjeff5 ( 1143585 ) writes:
        
        Fun fact:
        That quote is an urban legend, and there has never been any evidence that it was actually uttered by Gates.
        You'd think confirmation would be easy, since it was supposedly said at a 1981 computer trade show.
        It's like the famous quote "Let them eat cake" which is attributed to Marie Antoinette, but which scholars have never been able to find any evidence to suggest she actually uttered it.
        The idea that 640k would be enough forever is idiotic, especially since the industry was so constricted by the 64
- Re: (Score:3, Interesting)
  
  by cgenman ( 325138 ) writes:
  
  While I don't disagree that NVIDIA needs to make this simpler, is that really a sizeable market for them? Presuming every college will want a cluster of 100 GPU's, they've still got about 10,000 students per college buying these things to game with.
  I wonder what the size of the server room market for something that can't handle IF statements really would be.
  - Re: (Score:2)
    
    by Lord Ender ( 156273 ) writes:
    
    Well, since you can crack a password a hundred (or more) times faster with CUDA than with a CPU, they could at least sell a million units to the NSA and the FBI... and the analogous departments of every other country...
  - Re: (Score:1)
    
    by Dekker3D ( 989692 ) writes:
    
    Plenty of data processing could be parallelized to GPU style code, I'll bet. As long as you've got enough data that needs enough processing, you can probably get a speedup from that. Just how much, is another question..
- Re: (Score:2)
  
  by bberens ( 965711 ) writes:
  
  import java.util.concurrent.*; //???
- Re: (Score:2)
  
  by tedgyz ( 515156 ) * writes:
  
  I was interested in CUDA until I learned that even the simplest of "hello world" apps is still quite complex and quite low-level.
  NVidia needs to make the APIs and tools for CUDA programming simpler and more accessible, with solid support for higher-level languages. Once that happens, we could see adoption skyrocket.
  The simple fact is, parallel programming is very hard. More to the point, most programs don't need this type of parallelism.
- Re: (Score:2)
  
  by russotto ( 537200 ) writes:
  
  I found just the opposite; not enough low-level access. For instance, no access to the carry bit from integer operations!
- Re: (Score:2)
  
  by MostAwesomeDude ( 980382 ) writes:
  
  I don't mean to be rude, but graphics processors don't work that way. They are not general-purpose and I would not expect general-purpose toolkits to show up for them anytime soon.
  As a thought experiment, consider Linux. It requires 8MB of RAM and support for the C language on its targets. Larrabee ran BSD and the engineers were trying to get Linux on there when the project was scuttled. Larrabee could have been a chipset where you could use "higher-level languages" to do this stuff, but it would have been
- Re: (Score:2)
  
  by the_one(2) ( 1117139 ) writes:
  
  You should take a look at brook (though it seems to be dying in favor of OpenCL). It's really straightforward and almost simpler than programming for the CPU. Of course I did have quite a lot of trouble compiling the programs... but that's probably because I suck.
- - Re: (Score:2)
    
    by Lord Ender ( 156273 ) writes:
    
    The PyCUDA "hello world" involvies inline C code!
Notice in TFA (Score:1)

by blai ( 1380673 ) writes:

"OpenCL is managed by a standards group, which is a great way to get nothing done"

I don't see the correlation.
- Re: (Score:2, Interesting)
  
  by binarylarry ( 1338699 ) writes:
  
  Not only that, but they posit that Microsoft's solution solves the issue of both Nvidia's proprietary-ness and the OpenCL boards's "lack of action."
  Fuck this article, I wish I could unclick on it.
OpenCL (Score:3, Informative)

by gbrandt ( 113294 ) writes: on Thursday July 15, 2010 @03:50PM (#32918440)

Sounds like a perfect job for OpenCL. When a program is rewritten for OpenCL, you can just drop in CPU's or GPU's and they get used.

Share
twitter facebook
- Re: (Score:3, Informative)
  
  by Anonymous Coward writes:
  
  Unfortunately, no. OpenCL does not map equally to different compute devices, and does not enforce uniformity of parallelism approaches. Code written in OpenCL for CPUs is not going to be fast on GPUs. Hell, OpenCL code written for ATI GPUs is not going to work well on nVidia GPUs.
- Re: (Score:2)
  
  by quanticle ( 843097 ) writes:
  
  Well, true, but that overlooks the fact that porting a program to OpenCL is not exactly a trivial task.
Of course not! (Score:3, Informative)

by Yvan256 ( 722131 ) writes: on Thursday July 15, 2010 @03:52PM (#32918478) Homepage Journal

It's not something that can be accomplished with a few libraries and lines of code.
It doesn't take a few libraries and lines of code... It takes a SHITLOAD of libraries and lines of code! - Lone Starr

Share
twitter facebook
Not really news... (Score:2)

by Third Position ( 1725934 ) writes:

I remember reading that IBM was planning to put Cell in mainframes [hpcwire.com] and other high-end servers several years ago, supposedly to accrue the same benefits. I don't really know whether or not that was ever followed through with, I haven't kept track of the story.
- Re: (Score:2, Interesting)
  
  by Dynetrekk ( 1607735 ) writes:
  
  I'm no expert, but from what I understand, it wouldn't be at all surprising. IBM has been regularly using their Power processors for supercomputers, and the architecture is (largely) the same. The Cell has some extra graphics-friendly floating-point units, but it's not entirely differnent from the CPUs IBM has been pushing for computation in the past. I'm not even sure if the extra stuff in the Cell is interesting in the supercomputing arena.
  - Re: (Score:1, Interesting)
    
    by Anonymous Coward writes:
    
    The Cell is a PowerPC processor, which is intimately related with the Power architecture. Basically, PowerPC was an architecture designed by IBM, Apple, and Motorola, for use in high performance computing. It was based in part on an older (now) version of IBM's POWER architecture. In short, POWER was the "core" architecture, and additional instruction sets could be added at fabrication time -- kind of like Intel with their SSE extensions.
    This same pattern continued for a long time. IBM's POWER architect
  - Re: (Score:2)
    
    by inKubus ( 199753 ) writes:
    
    They have the zIIP [ibm.com] and zAPP processors on the z series mainframes, which are specialty procs. zIIP for database and encryption, zAPP is basically a java VM in hardware. IBM is big, and they have specialty fabs to make silicon for specialty mainframes. Yeah, they are expensive but worth it for some applications.
- - Re: (Score:2)
    
    by ihuntrocks ( 870257 ) writes:
    
    http://www.fixstars.com/en/products/gigaaccel180/features.html [fixstars.com] I wouldn't mind having a few of those. Also, the QS22 blades that I worked with were also very nice in my opinion. Cell is a fun architecture.
Libraries (Score:2, Insightful)

by Dynetrekk ( 1607735 ) writes:

I'm really interested in using GPGPU for my physics calculations. But you know - I don't want to learn Nvidia's low-level, proprietary (whateveritis) in order to do an addition or multiplication, which may or may not outperform the CPU version. What would be _really_ great is stuff like porting the standard "low-level numerics" libraries to the GPU: BLAS, LAPACK, FFTs, special functions, and whatnot - the building blocks for most numerical programs. LAPACK+BLAS you already get in multicore versions, and the
- Re:Libraries (Score:4, Informative)
  
  by brian_tanner ( 1022773 ) writes: on Thursday July 15, 2010 @04:13PM (#32918796)
  
  It's not free, unfortunately. I briefly looked into using it but got distracted by something shiny (maybe trying to finish my thesis...)
  
  CULA is a GPU-accelerated linear algebra library that utilizes the NVIDIA CUDA parallel computing architecture to dramatically improve the computation speed of sophisticated mathematics.
  http://www.culatools.com/ [culatools.com]
  
  Parent Share
  twitter facebook
  - Re: (Score:2, Informative)
    
    by Anonymous Coward writes:
    
    It's not as complete as CULA, but for free there is also MAGMA [utk.edu]. Also, nVidia implements a CUDA-accelerated BLAS (CUBLAS) which is free.
    As far as OpenCL goes, I don't think there has been much in terms of a good BLAS made yet. The compilers are still sketchy (especially for ATI GPUs), and the performance is lacking on nVidia GPUs compared to CUDA.
- Re: (Score:2)
  
  by ihuntrocks ( 870257 ) writes:
  
  I know I posted this like a little bit above, but this sounds like something you might be looking for. Any card with the PowerXCell setup. http://www.fixstars.com/en/products/gigaaccel180/features.html [fixstars.com] If you check under the specs section, you'll see tha BLAS, LAPACK, FFT, and several other numeric libraries are supported. Also, the GCC can target Cell. All around, not a bad set up for physics modeling.
- Re: (Score:3, Informative)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
IIS 3D (Score:2, Interesting)

by curado ( 1677466 ) writes:

So.. webpages will soon be available in 3D with anti-aliasing and realistic shading?
- Re: (Score:2)
  
  by Enderandrew ( 866215 ) writes:
  
  Yes, actually. IE9 uses DirectDraw and your graphics card to render fonts smoother and faster. Firefox has a similar project in the works.
Wouldn't a DSP do better? (Score:2, Interesting)

by 91degrees ( 207121 ) writes:

So why a GPU rather than a dedicated DSP? Seems they do pretty much the same thing except a GPU is optimised for graphics. A DSP offers 32 or even 64 bit integers, have had 64 bit floats for a while now, allow more flexible memory write positions, and can use the previous results of adjacent values in calculations.
- Re: (Score:3, Informative)
  
  by pwnies ( 1034518 ) writes:
  
  Price. GPUs are being mass produced. Why create a separate market that only has the DSP in it (even if the technology is already present and utilized by GPUs) for the relatively small amount of servers that will be using them?
- Modern GPUs, for all their hype, are just DSPs (Score:4, Interesting)
  
  by pslam ( 97660 ) writes: on Thursday July 15, 2010 @04:52PM (#32919304) Homepage Journal
  
  I could almost EOM that. They're massively parallel, deeply pipelined DSPs. This is why people have trouble with their programming model.
  The only difference here is the arrays we're dealing with are 2D and the number of threads is huge (100s-1000s). But each pipe is just a DSP.
  OpenCL and the like are basically revealing these chips for what they really are, and the more general purpose they try to make them, the more they resemble a conventional, if massively parallel, array of DSPs.
  There's a lot of comments on this subject along the lines of "Why couldn't they make it easier to program?" Well, it always boils down to fundamental complexities in design, and those boil down to the laws of physics. The only way you can get things running this parallel and this fast is to mess with the programming model. People need to learn to deal with it, because all programming is going to end up heading this way.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Interesting)
    
    by pclminion ( 145572 ) writes:
    
    There's a lot of comments on this subject along the lines of "Why couldn't they make it easier to program?"
    Why should they? Just because not every programmer on the planet can do it doesn't mean there's nobody who can do it. There are plenty of people who can. Find one of these people and hire them. Problem solved.
    Most programmers can't even write single-threaded assembly code any more. If you need some assembly code written, you hire somebody who knows how to do it. I don't see how this is any different.
    - Re: (Score:2)
      
      by CodeBuster ( 516420 ) writes:
      
      Most programmers can't even write single-threaded assembly code any more.
      The reason that we don't is that modern optimizing compilers have made doing so almost a complete waste of time except in very highly specialized or niche applications. I would liken it to chess playing AIs: the greatest human grand masters can still defeat them with effort but the rest of us will get our butts handed to us by the AI every time. To quote one fictional AI, "the only winning move is not to play".
      As far as whether all programming will head this direction eventually, I don't think so.
      I don't think so either. If anything programming is becoming abstract and virtual to the point whe
      - Re: (Score:2)
        
        by marcosdumay ( 620877 ) writes:
        
        "The reason that we don't is that modern optimizing compilers have made doing so almost a complete waste of time except in very highly specialized or niche applications."
        
        Most programmers don't know assembly because high level compilers made things simpler, and that increased the number of programmers out there by orders of magniture. Most of those extra programmers were drawn from a pool of people that (lacking education or just intrisical capacity/motivation) couldn't learn to use machine code. Ok, it not
        
        Re: (Score:2)
        
        by CodeBuster ( 516420 ) writes:
        
        Your argument is true for some applications, and completely false on others.
        Yes, but that does not make it a 50/50 proposition here in the real world. The vast majority of us who are paid to do software development work use languages and write programs where the hardware, especially with all of the virtualization these days, really doesn't matter. Most of us are engaged in writing business applications, not avionics, device drivers, or embedded controllers. The largest segment of the market where hardware performance optimization is still important is probably the games market, whi
Crysis 2... (Score:2, Funny)

by drc003 ( 738548 ) writes:

...coming soon to a server farm near you!
- Re: (Score:2, Interesting)
  
  by JorgeM ( 1518137 ) writes:
  
  I'd love this, actually. My geek fantasy is to be able to run my gaming rig in a VM on a server with a high end GPU which is located in the basement. On my desk in the living room would be a silent, tiny thin client. Additionally, I would have a laptop thin client that I could take out onto the patio.
  On a larger scale, think Steam but with the game running on a server in a datacenter somewhere which would eliminate the need for hardware on the user end.
  - Re: (Score:1)
    
    by drc003 ( 738548 ) writes:
    
    I like the way you think. In fact now I'm all excited at th.......ahhhhhhhhhooohhhhhhhhhhhh. Oops.
    - Re: (Score:2)
      
      by SleazyRidr ( 1563649 ) writes:
      
      +1 overinformative.
  - Re: (Score:2)
    
    by Dalambertian ( 963810 ) writes:
    
    Sacrificing all my mod points to say this, but a friend of mine did this with his PS3 so he could play remotely using a PSP. Also, check out OnLive for a pretty slick implementation of gaming in the cloud.
  - You must be salivating about OnLive, then (Score:2)
    
    by rsborg ( 111459 ) writes:
    
    From wikipedia [wikimedia.org]:
    OnLive is a gaming-on-demand platform, announced in 2009[3] and launched in the United States in June 2010. The service is a gaming equivalent of cloud computing: the game is synchronized, rendered, and stored on a remote server and delivered via the Internet.
    Sounds very interesting to me, as I'm pretty sick of upgrade treadmills. OnLive would probably also wipe out hacked-client based cheating (though bots and such might still be doable). It would also allow bleeding-edge games to be enjoy
RemoteFX (Score:2, Interesting)

by JorgeM ( 1518137 ) writes:

No mention of Microsoft's RemoteFX coming in Windows 2008 R2 SP1? RemoteFX uses the server GPU for compression and to provide 3d capabilites to the desktop VMs.
Any company large enough for a datacenter is looking at VDI and RemoteFX is going to be supported by all of VDI providers except VMware. VDI, not relatively niche case massive calculations, will put GPUs in the datacenter.
How much number-crunching is your server doing? (Score:2)

by Animats ( 122034 ) writes:

If your data center is running stochastic tests, trying scenarios on derivative securities, it's a big win. If it's serving pages with PHP, zero win.
There are many useful ways to use a GPU. Machine learning. Computer vision. Finite element analysis. Audio processing. But those aren't things most people are doing. If your problem can be expressed well in MATLAB, a GPU can probably accelerate it. MATLAB connections to GPUs are becoming popular. They're badly needed; MATLAB is widely used in enginee
- Re: (Score:2)
  
  by ceoyoyo ( 59147 ) writes:
  
  Many people will be doing those things going ahead: all forms of machine learning. The obvious example is natural language processing for your web page.
- - Re: (Score:2)
    
    by smallfries ( 601545 ) writes:
    
    But it is not the same kind of maths. Most GPUs support very fast use of single-precision floats. The asymmetric crypto that you use to establish your SSL connection uses very large integers, and the AES that encrypts the stream operates in a finite field. Neither can executed efficiently on a GPU.
    - Re: (Score:2)
      
      by Jeremy Erwin ( 2054 ) writes:
      
      CUDA compatible GPU as an efficient hardware accelerator for AES Cryptography [manavski.com] It's from 2007, so perhaps the bugs have been ironed out.
      - Re: (Score:3, Informative)
        
        by smallfries ( 601545 ) writes:
        
        No, it's the difference between "efficiency" and what is claimed as "efficient" to get a paper published. That's a really bad citation for AES on GPUs as there is a line of prior work going back to Cook and Cryptographics. In fact that paper is a classic example of getting something into the literature that has already been done. The authors have submitted it to an unrelated conference and failed to cite the relevant work.
        If we look at their best figures then throw away the 15x claimed speedup as it doesn't
Parallel Pr0n (Score:2)

by tedgyz ( 515156 ) * writes:

There's always an application for that.
Why call the GPU a gaming chip? (Score:1)

by wrightrocket ( 1664871 ) writes:

It is a Graphics Processing Unit, not a Gaming Processing Unit. Sure, they are great for gaming, but also very useful for other types of 3D and 2D rendering of graphics.
- Re: (Score:2)
  
  by Urkki ( 668283 ) writes:
  
  It is a Graphics Processing Unit, not a Gaming Processing Unit. Sure, they are great for gaming, but also very useful for other types of 3D and 2D rendering of graphics.
  But the top bang-for-the-buck chips are designed for games. They have architecture (number of pipelines etc) designed to maximize performance in typical game use, at a framerate needed for games. In other words, they're gaming chips, just like eg. PS3 is a game console, no matter if it can be used to build a cluster for number crunching.
Huh... (Score:1)

by geemon ( 513231 ) writes:

Saw the title of this article and wondered "how will Las Vegas casinos make the move to have all of my gaming chips put onto a server."
GPU apps are pretty specific... (Score:3, Insightful)

by bored ( 40072 ) writes: on Thursday July 15, 2010 @11:13PM (#32922734)

I've done a little CUDA programming, and I've yet to find significant speedups doing it. Every single time, some limitation in the arch keeps it from running well. My last little project, ran about 30x faster on the GPU than the CPU, the only problem was that the overhead of getting it to the GPU + computation + overhead of getting it back, was roughly equal to the time it took to just dedicate a CPU.
I was really excited about AES on the GPU too, until it turned out to be about 5% faster than my CPU.
Now if the GPU was designed more as a proper coprocessor (ala early x87, or early Weitek) and integrated into the memory hierarchy better (put the funky texture ram and such off to the side) some of my problems might go away.

Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

A whole new level of parallelism (Score:5, Insightful)

Re:A whole new level of parallelism (Score:4, Insightful)

Re:A whole new level of parallelism (Score:5, Insightful)

Re: (Score:1, Informative)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

There are also easy problems (Score:2)

Re:A whole new level of parallelism (Score:4, Interesting)

Re:A whole new level of parallelism (Score:5, Insightful)

Re: (Score:2, Interesting)

Re:A whole new level of parallelism (Score:4, Funny)

Re: (Score:2, Interesting)

Re: (Score:3, Insightful)

Re:A whole new level of parallelism (Score:4, Insightful)

Re: (Score:2)

Re:A whole new level of parallelism (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Insightful)

Re:A whole new level of parallelism (Score:5, Informative)

Re: (Score:1, Insightful)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Informative)

Re: (Score:2, Interesting)

Good luck with that (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Yes, of course (Score:2, Funny)

Re:Yes, of course (Score:5, Funny)

CUDA (Score:4, Informative)

Re:CUDA (Score:5, Interesting)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2, Informative)

Re: (Score:2)

Re: (Score:2, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Notice in TFA (Score:1)

Re: (Score:2, Interesting)

OpenCL (Score:3, Informative)

Re: (Score:3, Informative)

Re: (Score:2)

Of course not! (Score:3, Informative)

Not really news... (Score:2)

Re: (Score:2, Interesting)

Re: (Score:1, Interesting)

Re: (Score:2)

Re: (Score:2)

Libraries (Score:2, Insightful)

Re:Libraries (Score:4, Informative)

Re: (Score:2, Informative)

Re: (Score:2)

Re: (Score:3, Informative)

IIS 3D (Score:2, Interesting)

Re: (Score:2)

Wouldn't a DSP do better? (Score:2, Interesting)

Re: (Score:3, Informative)

Modern GPUs, for all their hype, are just DSPs (Score:4, Interesting)

Re: (Score:3, Interesting)