

Intel, NVIDIA Take Shots At CPU vs. GPU Performance 129
MojoKid writes "In the past, NVIDIA has made many claims of how porting various types of applications to run on GPUs instead of CPUs can tremendously improve performance — by anywhere from 10x to 500x. Intel has remained relatively quiet on the issue until recently. The two companies fired shots this week in a pre-Independence Day fireworks show. The recent announcement that Intel's Larrabee core has been re-purposed as an HPC/scientific computing solution may be partially responsible for Intel ramping up an offensive against NVIDIA's claims regarding GPU computing."
Re:first post! (Score:5, Informative)
Awesome. And now maybe you've learned a lesson. While the external processor was faster, sending your data over the bus to the external processor has an inherent delay in it. That's why your first post came in fourth.
Re:It depends? (Score:5, Informative)
Re:It depends? (Score:5, Informative)
Basically, GPUs are stream processors. They are fast at tasks that meet the following criteria:
1) Your problem has to be more or less infinitely parallel. A modern GPU will have anywhere in the range of 128-512 parallel execution units, and of course you can have multiple GPUs. So it needs to be something that can be broken down in to a lot of peices.
2) Your problem needs to be floating point. GPUs push 32-bit floating point numbers really fast. The most recent ones can also do 64-bit FP numbers at half the speed. Anything older is pretty much 32-bit only. For the most part, count on single precision FP for good performance.
3) Your problem must fit within the RAM of the GPU. This varies, 512MB-1GB is common for consumer GPUs, 4GB is fairly easy to get for things like Teslas that are built for GPGPU. GPUs have extremely fast RAM connected to them, much faster than even system RAM. 100GB/sec+ is not uncommon. While a 16x PCIe bus is fast, it isn't that fast. So to get good performance, the problem needs to fit on the GPU. You can move data to and from the main memory (or disk) occasionally, but most of the crunching must happen on card.
4) Your problem needs to have not a whole lot of branching, and when it does branch, multiple paths need to branch the same. GPUs handle branching, but not all that well. The performance penalty is pretty high. Also generally speaking a whole group of shaders has to branch the same way. So you need the sort of thing that when the "else" is hit, it is hit for the entire group.
So, the more similar your problem is to that, the better GPUs work on it. 3D graphics would be an excellent example of something that meets that precisely, which is no surprise as that's what they are made for. The more your deviate from that, the less suited GPUs are. You can easily find tasks they are exceedingly slow at compared to CPUs.
Basically modern CPUs tend to be quite good at everything. They have strong performance across the board so no matter what the task, they can do it well. The downside is they are unspecalized, they excel at nothing. The other end of the spectrum is an ASIC, a circuit designed for one and only one thing. That kind of thing can be extremely efficient. Something like a gigabit switch ASIC is a great example. You can have a tiny chip that draws a couple watts and yet and switch 50+gbit/sec of traffic. However that ASIC can only do its one task, no programability. GPUs are something of a hybrid. They are fully programmable, but they are specialized in to a given field. As such at the tasks they are good at, the are extremely fast. At the tasks they are not, they are extremely slow.
Re:Optimizations Matter (Score:4, Informative)
Its certainly true that most programmers reach for the later style, but mainly because they arent planning on using any SIMD.
Re:AMD (Score:5, Informative)
AMD just isnt doing well in the high end consumer-grade space, but then again the chips that Intel is ruling with in that segment are priced well above consumer budgets.
Re:It depends? (Score:5, Informative)
"So to get good performance, the problem needs to fit on the GPU. You can move data to and from the main memory (or disk) occasionally, but most of the crunching must happen on card."
From what I have seen when people use GPUs for HPC, this, more often than anything else, is the limiting factor. The actual calculations are plenty fast, but the need to format your data for the GPU, send it, then do the same in reverse for the result really limits the practical gain you get.
I'm not saying it's useless or anything - far from it - but this issue is as important asthe actual processing you want to do for determining what kind of gain you'll see from such an approach.
Re:Who cares anymore? (Score:4, Informative)
An $80 video card enables high/ultra settings at 60+ FPS on nearly all games for the "low resolution" group, but not the "high resolution" group.
Re:It depends? (Score:4, Informative)
GPUs have extremely fast RAM connected to them, much faster than even system RAM
I'd like to see a citation for that little bit of trivia
Ok, so my Geforce GTX480 has GDDR5 ( http://www.nvidia.com/object/product_geforce_gtx_480_us.html [nvidia.com] ) which is based on DDR3 ( http://en.wikipedia.org/wiki/GDDR5 [wikipedia.org] )
My memory bandwidth on the GTX480 is 177 GB/sec. The fastest DDR3 module is PC3-17000 ( http://en.wikipedia.org/wiki/DDR3_SDRAM [wikipedia.org] ) which gives approx 17000 MB/s which is approx 17GB/sec. So my graphics ram is basically 10x faster than system ram as it should be.
That's the big draw of the Teslas (Score:3, Informative)
I mean when you get down to it, the seem really overpriced. No video output, their processor isn't anything faster, what's the big deal? Big deal is that 4x the RAM can really speed shit up.
Unfortunately there are very hard limits to how much RAM they can put on a card. This is both because of the memory controllers, and because of electrical considerations. So you aren't going to see a 128GB GPU or the like any time soon.
Most of our researchers that do that kind of thing use only Teslas because of the need for more RAM. As you said, the transfer is the limiting factor. More RAM means less often you have to snuffle data back and forth.