Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Intel Hardware

Intel, NVIDIA Take Shots At CPU vs. GPU Performance 129

MojoKid writes "In the past, NVIDIA has made many claims of how porting various types of applications to run on GPUs instead of CPUs can tremendously improve performance — by anywhere from 10x to 500x. Intel has remained relatively quiet on the issue until recently. The two companies fired shots this week in a pre-Independence Day fireworks show. The recent announcement that Intel's Larrabee core has been re-purposed as an HPC/scientific computing solution may be partially responsible for Intel ramping up an offensive against NVIDIA's claims regarding GPU computing."
This discussion has been archived. No new comments can be posted.

Intel, NVIDIA Take Shots At CPU vs. GPU Performance

Comments Filter:
  • first post! (Score:4, Funny)

    by Dynetrekk ( 1607735 ) on Sunday June 27, 2010 @08:14AM (#32708254)
    I am now posting using my GPU. It's at least 50x faster!
  • It depends? (Score:5, Insightful)

    by aliquis ( 678370 ) on Sunday June 27, 2010 @08:23AM (#32708284)

    Isn't it like saying "Ferrari makes the fastest tractors!" (yeah, I know!), which may be true, as long as they can actually carry out the things you want to do.

    I don't know about the limits of OpenCL/GPU-code (or architecture compared to regular CPUs/AMD64 functions, registers, cache, pipelines, what not), but I'm sure there's plenty and that someone will tell us.

    • Re:It depends? (Score:5, Informative)

      by jawtheshark ( 198669 ) * <slashdot@nosPAm.jawtheshark.com> on Sunday June 27, 2010 @08:44AM (#32708358) Homepage Journal
      Try Lamborghini next time... You do know that Mr Lamborghini originally made his money making tractors. The legend [wikipedia.org] says he wasn't satisfied with what Ferrari offered as sports cars and thus made one himself. Originally, Lamborghini is a tractor brand.... Not kidding. I think they still make them [lamborghini-tractors.com]...
      • by aliquis ( 678370 )

        Yeah I wondered which one it was but I was somewhat to lazy I guess. Maybe the history was the Lamborghini guy decided he could to ..

        Only googled ferrari tractor to see if they had any or whatever it was lamborghini, got a few tractor images so I went with that.

        So Lamborghini went super-cars and Ferrari tractors ("if they can beat us at cars we for sure will show them with tractors!"? :D)

        Sorry for messing up :)
        http://www.ferrari-tractors.com/ [ferrari-tractors.com]

      • There are also ferrari tractors [ferraritractors.co.nz], unrelated to the sports car manufacturer though.

      • Shhhhh.... Your giving rednecks around the world hope that someday John Deere will make a sports car...

    • Re:It depends? (Score:5, Informative)

      by Sycraft-fu ( 314770 ) on Sunday June 27, 2010 @08:59AM (#32708422)

      Basically, GPUs are stream processors. They are fast at tasks that meet the following criteria:

      1) Your problem has to be more or less infinitely parallel. A modern GPU will have anywhere in the range of 128-512 parallel execution units, and of course you can have multiple GPUs. So it needs to be something that can be broken down in to a lot of peices.

      2) Your problem needs to be floating point. GPUs push 32-bit floating point numbers really fast. The most recent ones can also do 64-bit FP numbers at half the speed. Anything older is pretty much 32-bit only. For the most part, count on single precision FP for good performance.

      3) Your problem must fit within the RAM of the GPU. This varies, 512MB-1GB is common for consumer GPUs, 4GB is fairly easy to get for things like Teslas that are built for GPGPU. GPUs have extremely fast RAM connected to them, much faster than even system RAM. 100GB/sec+ is not uncommon. While a 16x PCIe bus is fast, it isn't that fast. So to get good performance, the problem needs to fit on the GPU. You can move data to and from the main memory (or disk) occasionally, but most of the crunching must happen on card.

      4) Your problem needs to have not a whole lot of branching, and when it does branch, multiple paths need to branch the same. GPUs handle branching, but not all that well. The performance penalty is pretty high. Also generally speaking a whole group of shaders has to branch the same way. So you need the sort of thing that when the "else" is hit, it is hit for the entire group.

      So, the more similar your problem is to that, the better GPUs work on it. 3D graphics would be an excellent example of something that meets that precisely, which is no surprise as that's what they are made for. The more your deviate from that, the less suited GPUs are. You can easily find tasks they are exceedingly slow at compared to CPUs.

      Basically modern CPUs tend to be quite good at everything. They have strong performance across the board so no matter what the task, they can do it well. The downside is they are unspecalized, they excel at nothing. The other end of the spectrum is an ASIC, a circuit designed for one and only one thing. That kind of thing can be extremely efficient. Something like a gigabit switch ASIC is a great example. You can have a tiny chip that draws a couple watts and yet and switch 50+gbit/sec of traffic. However that ASIC can only do its one task, no programability. GPUs are something of a hybrid. They are fully programmable, but they are specialized in to a given field. As such at the tasks they are good at, the are extremely fast. At the tasks they are not, they are extremely slow.

      • I wonder if competition from GPUs will influence Intel to beef up the vector processing capabilities of it's chips. Currently Intel' SSE is pretty weak, especially when you compare it to competitors like Altivec. Unfortunately outside of Cell there aren't a whole lot of CPUs nowadays that feature Altivec....
        • Re:It depends? (Score:5, Insightful)

          by rahvin112 ( 446269 ) on Sunday June 27, 2010 @09:43AM (#32708598)

          It is not a secret (it's a stated fact on both Intel and AMD's roadmaps) to integrate GPU like programmable FP into the FP units of the general processor. The likely result will be the same general purpose CPU you love, but there will be dozens of additional FP units that excel at mathematics like the parent described except more flexible. When the fusion'eske products ramp and GPGPU functionality is integrated into the CPU Nvidia is out of business. Oh I don't expect these fusion products to have great GPU's, but once you destroy the low end and mid range graphics marketplace there is very little $$ wise left to fund R&D (3dfx was the first one into the high end 3d market and they barely broke even on their first sales, the only reason they survived was because they were heavy in the arcade sector sales). If Nvidia hasn't been allowed to purchase Via's x86 license by that point they are quite frankly out of business. Not immediately of course, they will spend a few years evaporating all assets while they try to compete with only the highend marketplace but in the end they won't survive. Things go in cycles and the independent graphics chip cycle is going to end very shortly, maybe in a decade it will come back, but I'm skeptical. CPU's have exceeded the speed needed for 80% of most tasks out there.

          When I first started my Career computer runs of my design work took about 5-30 minutes to run on bare minimum quality. These days I can exceed that bare minimum by 20 times and the run will take seconds. It's to the point where I can model with far more precision than the end product needs with almost no time penalty. In fact additional CPU speed at this point is almost meaningless and my business isn't alone in this. In fact most of the software in my business is single threaded (and the apps run that fast with single threads). Once the software is multi-threaded there is really no additional CPU power needed and it may come to the point where my business just stops upgrading hardware beyond what's need to replace failures and my business isn't alone. I just don't see a future for independent graphics chip/card producers.

          • They are called, specifically, FPU's not FP's.

            As for the cpu guys putting the gpu guys out of business... we know how successful Intel has been trying to do just that with their GPU offerings... you expect that to change in the next, say, 10 years? Not likely given their past track record of failure.

          • Comment removed based on user account deletion
            • If your theory was true, why hasn't it already happened? Both AMD and Nvidia have been putting pretty nice GPUs on motherboards for quite awhile, yet we still have discrete cards, why?

              You have misunderstood the theory. The theory is that level of functionality will be merged into the CPU. Not into the motherboard.

              There is a good reason why, for the most basic office task, even two or three year old gaming, the onboard chips work fine. I myself played Bioshock I and Swat 4 on my onboard with no trouble.

              We still have discrete graphics cards because onboard GPUs are sufficient for most tasks? Do you have any idea what you're saying here?

              But for anything where you care even a little bit about REAL performance the onboards, and I don't care if we are talking onboard or on die, simply won't have a chance. You just can't put hundreds of Mb or even Gbs of RAM onto the die.

              They're not on the die in a motherboard-integrated solution, either. They use system memory. Further, CPUs can already access system memory. You clearly have no idea what you are talking about.

              So while you think discrete GPUs are gonna die, I'd say the opposite is true. I think the onboards will be used in machines where price trumps everything, such as bottom of the line netbooks and Walmart/Best Buy "specials", whereas for everything else since HD and games like Warcrack will continue to be popular and thus selling points discretes will bring in the "wow" factor and help OEMs to differentiate their products.

              You don't get it. ALL GPUs are going to go away, beca

              • Comment removed based on user account deletion
                • I understand perfectly, it is you that probably needs a WHOOSH here.

                  Well, give it a shot.

                  I know all about having everything onboard, as I'm old enough to remember when there was NOTHING but onboard.

                  Well, I had an Altos CP/M machine that had everything onboard, once. But my next one, a Kaypro 4, carried its modem on a daughterboard. So really, we're talking about times so old as to not be worth mentioning.

                  The problem with your theory is unless you make the CPU as easy to toss as the old Slot 1s it simply isn't gonna work for anything but the most basic tasks.

                  Unless the CPU is easily removable and discardable, it simply isn't going to be able to compute on the proper level? We're talking about power, not packages. Speak English, and make it relevant.

                  And system memory will ALWAYS suck...full stop.

                  That's funny, my crystal ball remains cloudy on the subject. Even my magic 8-Ball is no

                  • Comment removed based on user account deletion
                    • Finally since you want me to quote, I will. You said, and I quote "You don't get it. ALL GPUs are going to go away, because CPUs are getting better at GPU tasks faster than GPUs are getting better at CPU tasks. " Where is your proof? Pinetrail? A FIVE YEAR OLD GPU jammed into the CPU?

                      You're so stupid, I can barely stand it. you want a GPU put into a CPU package to be proof that GPUs are going away? I said they're going away, not going into the package with the CPU. This is the general theme of your "conversation", attacking straw men. Welcome to my foes list, idiot. I can't waste more time on someone who doesn't understand when they are spewing logical fallacies (or who does it on purpose; I have not totally ruled out the possibility that you are a troll. But I suspect you simply have v

                    • Comment removed based on user account deletion
              • CPUs have a long way to go to reach the level of memory bandwidth and latency available to a top grade graphic card - even triple channel DDR3 kits offer only some 50GB/s, while a AMD 4870 has some 115 GB/s. Latency on the GPU is similar with a top end desktop memory (server memory tends to have lower latency and reduced speed though, even if servers might use more memory channels than the three used in the top end desktops).

          • But I think the timescale will be a very long one.

            I mean ideally, we want only the CPU in a computer. The whole idea of a computer is that it does everything, rather than having dedicated devices. Ideally that means that it does everything purely in software, that the CPU is all it needs. For everything else, we seem to have reached that point but graphics are still too intense. Have to have a dedicated DSP for them.

            However, we'll keep wanting that until the CPU can do photorealistic graphics in realtime. T

          • If Nvidia hasn't been allowed to purchase Via's x86 license by that point they are quite frankly out of business.

            Apparently, licensing terms for access to the x86 designs forbid it passing to non-US hands. I think VIA is working a loophole here, because Centaur is doing the CPU designs, VIA does the branding, chipsets, and sometimes the motherboards.

            However, it might work if nVidia aquired VIA and Centaur, and merged VIA's chipset departments into nvidia's (and segmented them or something) because otherwise they're only looking for Centaur and it ain't going to be pretty. I'm just disappointed in nVidia myself for let

        • by Bengie ( 1121981 )

          New AVX SIMD is coming out soon. The first set of 256bit registers are suppose to be 2xs as fast as SSE and later 512bit and 1024bit AVX are suppose to be another ~2-4xs faster than the 256bit. I guess one of the benefits of AVX is the new register sizes are suppose to give transparent speed increases. So a program made for 256bit AVX will automatically see faster calculations when the new 512bit AVX registers come out. Sounds good to me. They're suppose to be 3 operandi instructions.

          • I guess one of the benefits of AVX is the new register sizes are suppose to give transparent speed increases. So a program made for 256bit AVX will automatically see faster calculations when the new 512bit AVX registers come out.

            Afraid not (well, there are ways if you are willing to litter your code with C++ templates). Yes the instructions will process 8 floats, however you're only going to see some nice linear speed up if you are already using SOA data structures. For a lot of the 'traditional' SSE c
      • by g4b ( 956118 )
        maybe GPUs can solve life then. Real Life Problems meet most of the criteria. infinite amounts all at the same time, many numbers floating, small description size, tends to branch in an endless tree of solutions never to be achieved...
      • Re:It depends? (Score:5, Informative)

        by JanneM ( 7445 ) on Sunday June 27, 2010 @09:39AM (#32708570) Homepage

        "So to get good performance, the problem needs to fit on the GPU. You can move data to and from the main memory (or disk) occasionally, but most of the crunching must happen on card."

        From what I have seen when people use GPUs for HPC, this, more often than anything else, is the limiting factor. The actual calculations are plenty fast, but the need to format your data for the GPU, send it, then do the same in reverse for the result really limits the practical gain you get.

        I'm not saying it's useless or anything - far from it - but this issue is as important asthe actual processing you want to do for determining what kind of gain you'll see from such an approach.

        • I mean when you get down to it, the seem really overpriced. No video output, their processor isn't anything faster, what's the big deal? Big deal is that 4x the RAM can really speed shit up.

          Unfortunately there are very hard limits to how much RAM they can put on a card. This is both because of the memory controllers, and because of electrical considerations. So you aren't going to see a 128GB GPU or the like any time soon.

          Most of our researchers that do that kind of thing use only Teslas because of the need

          • by JanneM ( 7445 )

            The problem is when you have a larger system, with hundreds of cores, and an iterative simulation. You run the system for a cycle, propagate data, then run for another cycle and so on. In that case you can't isolate a long-running process on the card, and you end up having to squeeze data through that bus for each cycle anyway. It is likely still well worth using GPUs but you do need to take a good look at whether adding GPUs are more or less effective than using your funds to simply add more cores instead.

      • Re: (Score:1, Interesting)

        by Anonymous Coward

        That is an excellent post, with the exception of this little bit

        GPUs have extremely fast RAM connected to them, much faster than even system RAM

        I'd like to see a citation for that little bit of trivia... the specific type & speed of RAM on a board with a GPU varies based on model and manufacturer. Cheaper boards use slower RAM, the more expensive ones use higher end stuff. I haven't seen ANY GPU's that came with on-board RAM that is any different than what you can mount as normal system RAM, however.

        Not trolling, I wanted to point out a serious flaw in what in an otherwise great po

        • Re:It depends? (Score:4, Informative)

          by pnewhook ( 788591 ) on Sunday June 27, 2010 @10:09AM (#32708700)

          GPUs have extremely fast RAM connected to them, much faster than even system RAM

          I'd like to see a citation for that little bit of trivia

          Ok, so my Geforce GTX480 has GDDR5 ( http://www.nvidia.com/object/product_geforce_gtx_480_us.html [nvidia.com] ) which is based on DDR3 ( http://en.wikipedia.org/wiki/GDDR5 [wikipedia.org] )

          My memory bandwidth on the GTX480 is 177 GB/sec. The fastest DDR3 module is PC3-17000 ( http://en.wikipedia.org/wiki/DDR3_SDRAM [wikipedia.org] ) which gives approx 17000 MB/s which is approx 17GB/sec. So my graphics ram is basically 10x faster than system ram as it should be.

          • by Kjella ( 173770 )

            My memory bandwidth on the GTX480 is 177 GB/sec. The fastest DDR3 module is PC3-17000 ( http://en.wikipedia.org/wiki/DDR3_SDRAM [wikipedia.org] ) which gives approx 17000 MB/s which is approx 17GB/sec.

            And the high end CPUs have as far as I know triple channel memory now so a total of 51 GB/s. Not sure how valid that comparison is but graphics card tend to get their fill rate from having a much wider memory bus - the GTX480 has a 384 bit wide bus - rather than that much faster memory so it's probably not too far off. If CPUs move towards doing GPU-like work which can be loaded in wider chunks they'll probably move towards a wider bus too.

            • Width is part of it but it's also clock rate. The fastest overclocked DDR3 will go to 2.5GHz. The stock Geforce 480 is 3.7Ghz. At those rates the bus length gets to be an issue. The memory on a graphics card can be kept very close to the chip. On a PC the memory due to practical reason has to be set farther away resulting in necessarily slower clocks and data rates.

              The 51 GB/sec you mention is definitely overclocked. I've not seen stock memory that fast. Even so its still less than a third the rate of

        • Re: (Score:3, Interesting)

          by Spatial ( 1235392 )

          I haven't seen ANY GPU's that came with on-board RAM that is any different than what you can mount as normal system RAM, however.

          You haven't been looking very hard. Most GPUs have GDDR3 or GDDR5 running at very high frequencies.

          My system for example:
          Main memory: DDR2 400Mhz, 64-bit bus. 6,400 MB/sec max.
          GPU memory: GDDR3 1050Mhz, 448-bit bus. 117,600 MB/sec max.

          Maybe double the DDR2 figure since it's in dual-channel mode. I'm not sure, but it hardly makes much of a difference in contrast. :)

          That isn't even exceptional by the way. I have a fairly mainstream GPU, the GTX 260 c216. High-end cards like the HD5870 and GT

      • Re: (Score:3, Interesting)

        That's a very good breakdown of what you need to benefit from GPU based computing but, really, only #1 has any relevance vs. an x86 chip.

        #2) Yes, an x86 chip will have a high clock speed but, unless you can use SSE instructions, x86 is crazy slow. Also, most (if not all) architectures will give you half the flops for using the double precision vector instructions vs. the single precision ones.

        #3) This is a problem with CPUs as well except, as you point out, the memory is much slower. Performance is often

        • by Tacvek ( 948259 )

          The GPUs are definately worse than CPUs in branching.

          If your code splits into 8 different code paths at one point due to branching, your performance can be as bad as 1/8 the maximum, since rather than do anything remotely like actual branching, some GPUs just interleave the code of the different branches, with each instruction tagged as to whether which branch the code belongs to. So if the unit is processing an instrcution for a branch it is not on, it usts sits there doing nothing for one instruction cycl

      • by Elbows ( 208758 )

        The other big factor (the biggest in most of the GPU code I've written) is your pattern of memory access. Most GPUs have no cache so access to memory has very high latency even though the bandwidth is excellent. The card will hide this latency to some extent through clever scheduling; and if all your threads are accessing adjacent memory, it will coalesce that into one big read/write. But GPUs do best on problems where the ratio of arithmetic to memory access is high, and your data can hang around in regist

      • 2) Your problem needs to be floating point. GPUs push 32-bit floating point numbers really fast. The most recent ones can also do 64-bit FP numbers at half the speed. Anything older is pretty much 32-bit only. For the most part, count on single precision FP for good performance.

        That requirement is not necessarily true. Or at least not in the traditional sense of 'floating point.' GPUs make awesome pattern-matchers for data that isn't necessarily floating point.

        Elcomsoft (of adobe DRM international arreset fame) has a GPU accelerated password cracker [elcomsoft.com] that is essentially a massively parallel dictionary attack,

        A number of anti-virus vendors have GPU accelerated scanners - like Kaspersky. [theinquirer.net]

        And some people have been working with GPUs for network monitoring via packet analysis [ieee.org] too.

      • Some of the examples used in the cudaSDK are phoney. The sobel one can be made to run faster on the cpu - provided you use the intel compilers and performance primitives and can parallelise.

        It doesn't surprise me. There is an example of Sobel for the FPGA's that tote much faster execution times, but then when you examine the code, the fpga version has algorithmic optimisations that were 'left out' for the cpu version. Again, it can be made to run faster on the cpu.

        I'm not saying that GPUs are crap. For

      • Blah, why can't I get a good GPU accelerated Mandelbrot set viewer, then? z = z^2 + c meets all your criteria great, dun it? :P
  • You lazy fuckers (Score:5, Interesting)

    by drinkypoo ( 153816 ) <drink@hyperlogos.org> on Sunday June 27, 2010 @08:46AM (#32708372) Homepage Journal

    I don't expect slashdot "editors" to actually edit, but could you at least link to the most applicable past story on the subject [slashdot.org]? It's almost like you people don't care if slashdot appears at all competent. Snicker.

    • Comment removed based on user account deletion
      • what does this mean? totally lost, i am.

        • Re: (Score:1, Offtopic)

          by Entropius ( 188861 )

          So, once upon a time, there was this text editor called vi.

          To make it do shit you type in cryptic commands. The one for search-and-replace is s, followed by a slash, followed by the thing you want to search for, followed by another slash, followed by the thing you want to replace it with. Because of more arcana, this will only happen once per line unless you put a g after it.

          So s/cat/dog/g means "replace all occurrences of cat with dog".

          Incidentally, you also have to tell vi in what range it should do this

  • AMD (Score:5, Funny)

    by MadGeek007 ( 1332293 ) on Sunday June 27, 2010 @08:49AM (#32708386)
    AMD must feel very conflicted...
    • "daddy, what's AMD?" ... "well son, its that company that tried to keep doing everything at once and died."
      • Magny-Cours is currently showing significant performance advantage over Intel's offerings while at the same time AMD's Evergreen *mostly* shows performance advantages over nVidia's Fermi despite making it to market ahead of Fermi.

        AMD is currently providing the best tech on the market This will likely change, but at the moment, things look good for them.

        • I just got back from a lattice QCD conference, and there were lots of talks on GPGPU. Everybody's falling over each other trying to implement their code on GPU's because of the performance gains.

          *Every* talk mentioned Nvidia cards -- Geforce GTX nnn's, Tesla boards, Fermi boards. Nobody talked about AMD at all.

          Maybe AMD does have an advantage, but nobody's using it.

          • by hvdh ( 1447205 )

            Interestingly, most scientific papers talking about large speed gains (factor 2..10) by going from CPU to GPU computation compare a hand-optimized GPU implementation to a plain single-threaded non-SSE CPU implementation.

            From my experience, using SSE intrinsics gives a speed-up of 2..8 versus good generic code, and multi-threading gives more improvement until one hits the RAM bandwidth wall.

            • For those problems that map well to the GPU model of processing, the gains can be enormous (I have ported code to NVIDIA). However, some of my code works better on the CPU and some of it really needs a middle ground of many traditional cores with good branching support, etc. and not as many streaming cores all doing the same thing.
          • by Ken_g6 ( 775014 )

            *Every* talk mentioned Nvidia cards -- Geforce GTX nnn's, Tesla boards, Fermi boards. Nobody talked about AMD at all.

            Maybe AMD does have an advantage, but nobody's using it.

            That's because nVIDIA has excellent support, both on Windows and Linux, and documentation for their CUDA GPGPU system. They even have an emulator so people without an nVIDIA GPU can develop for one. (Although it's now deprecated.)

            On the other hand, AMD has CAL, Stream, and OpenCL; and I can't even figure out which one I'm supposed to use to support all GPGPU-capable AMD cards. OpenCL has some documentation; I can't find anything good on CAL, and I can't find any way to develop for the platform on Linux w

    • Re:AMD (Score:5, Insightful)

      by Junta ( 36770 ) on Sunday June 27, 2010 @10:04AM (#32708678)

      AMD is the most advantaged on this front...

      Intel and nVidia are stuck in the mode of realistically needing one another and simultaneously downplaying the other's contribution.

      AMD can use what's best for the task at hand/accurately portray the relative importance of their CPUs/GPUs without undermining their marketing message.

  • by leptogenesis ( 1305483 ) on Sunday June 27, 2010 @09:04AM (#32708440)
    At least as far as parallel computing goes. CPUs have been designed for decades to handle sequential problems, where each new computation is likely to have dependencies on the results of recent computations. GPUs, on the other hand, are designed for situations where most of the operations happen on huge vectors of data; the reason they work well isn't really that they have many cores, but that the operations for splitting up the data and distributing it to the cores is (supposedly) done in hardware. In a CPU, the programmer has to deal with splitting up the data, and allowing the programmer to control that process makes many hardware optimizations impossible.

    The surprising thing in TFA is that Intel is claiming to have done almost as well on a problem that NVIDIA used to tout their GPUs. It really makes me wonder what problem it was. The claim that "performance on both CPUs and GPUs is limited by memory bandwidth" seems particularly suspect, since on a good GPU the memory access should be parallelized.

    It's clear that Intel wants a piece of the growing CUDA userbase, but I think it will be a while before any x86 processor can compete with a GPU on the problems that a GPU's architecture was specifically designed to address.
  • The author doesn't understand what the straw man argument is. He thinks it is bringing up anything that isn't specifically mentioned in the original argument. Nvidia stating that optimizing multi-core CPUs is difficult and that the Nvidia architecture has hundreds of applications seeing a huge gain in performance now is a valid point even if the Intel side never mentioned the difficulty of implementation.

  • by Posting=!Working ( 197779 ) on Sunday June 27, 2010 @09:13AM (#32708482)

    What the hell kind of sales pitch is "We're only a little more than twice as slow!"

    [W]e perform a rigorous performance analysis and find that after applying optimizations appropriate for both CPUs and GPUs the performance gap between an Nvidia GTX280 processor and the Intel Core i7 960 processor narrows to only 2.5x on average.

    It's gonna work, too.

    Humanity sucks at math.

    • What the hell kind of sales pitch is "We're only a little more than twice as slow!"

      The two times speed gain point is where it becomes pointless to exploit specialized hardware. Frequently, the software development program manager has two choices:
      a) Ship a product now, or
      b) Spend 1 to 2 more years developing the product, then ship it.
      The issue is that hardware doubles in speed every 1 to 2 years. If the cost of exploiting current specialized hardware is an additional 1 to 2 years software development, t

      • by sbates ( 1832606 )
        Just a helpful tip: the next time you're tempted to add a comma, don't. It'll vastly improve the readability of your otherwise competent writing.
    • I did an experiment on a Core 2 Duo a couple years ago and found it to be only 5% as fast at doing a huge matrix multiply compared to a (then) top-of-the-line Nvidia. So, they're catching up pretty well.

      That's worth noting for people who've been following this closely for a while.

    • What the hell kind of sales pitch is "We're only a little more than twice as slow!"

      It's a very good sales pitch, actually. Unlike AMD, NVidia isn't an alternative to Intel CPUs. Instead it's a complimentary technology, which adds additional cost.

      So, I could buy a $500 CPU and a $500 GPU, or I could buy TWO $500 CPUs, and get most of the performance, without having to completely redesign all software to run on an GPU.

      And Intel has at least one good point, in that NVidia's claims are based on pretty naive m

      • Don't get me wrong, I like what Intel is doing, but c'mon, you are understating this:

        and the SIMD instructions that have been added to Intel/AMD CPUs in recent years really are the same thing you get with GPU programming, just on a bit smaller scale.

        It's an order of magnitude different (and I know from experience coding CPU and GPU)
        i7 960 - 4 cores 4 way SIMD
        GT285 (not 280) - 30 cores 32 way SIMD

        SP GFLOPS
        i7 960 - 102
        GT285 - 1080

        No matter what, AMD really wins in this one.

        AMD has the potent
    • by Twinbee ( 767046 )

      I think that CPUs are faster with conditional branching and other general purpose computing tasks, so I would sacrifice 2x for that.

  • From the article, you can narrow the gap:

    "with careful multithreading, reorganization of memory access patterns, and SIMD optimizations"

    Sometimes though, I don't want to spend all week making optimizations. I just want my code to run and run fast. Sure, if you optimize the heck out of a section of code, you can always eek out a bit more performance, but if the unoptimized code can run just as fast (on a GPU), why would I bother?

    • by Rockoon ( 1252108 ) on Sunday June 27, 2010 @09:23AM (#32708516)
      Just to be clear, those same memory reorganizations are required for the GPU. That being specifically the Structure-of-Arrays strategy instead of the Array-of-Structures strategy.

      Its certainly true that most programmers reach for the later style, but mainly because they arent planning on using any SIMD.
    • by Junta ( 36770 )

      The difference is the 'naive' code you write to do things in the simplest manner *can* run on a CPU. For the GPU languages, you *must* make those optimizations. This is not to undercut the value of GPU (as Intel concedes, the gap is large), but it does serve to counteract the dramatic numbers tauted by nVidia.

      nVidia compared expert tuned and optimized performance metrics on their product and compared against stock, generic benchmarks on intel products.

  • by Junta ( 36770 ) on Sunday June 27, 2010 @09:25AM (#32708520)

    On top of being highly capable at massively parallel floating point math (the bread and butter of top500 and most all real world HPC applications), GPU chips benefit from economies of scale by having a much larger market to sell chips to. If Intel has an HPC-only processor, I don't see it really surviving. There have been numerous HPC only accelerators that provided huge boosts over cpus that flopped. GPUs growing into that capability is the first large scale phenomenon in hpc with legs.

  • Does anyone under the age of 25 really care anymore about processor speed and video card "features"?

    I only ask because 15 years ago I cared greatly about this stuff. However, I'm not sure if that is a product of my immaturity at that time, or the blossoming industry in general.

    Nowadays it's all pretty much the same to me. Convenience (as in, there it is sitting on the shelf for a decent price) is more important these days.

    • by Overzeetop ( 214511 ) on Sunday June 27, 2010 @09:39AM (#32708574) Journal

      Two things: you've been conditioned to accept gaming graphics of yesteryear, and your need for more complex game play now trumps pure visuals. You can drop in a $100 video card, set the quality to give you excellent frame rates, and it looks fucking awesome because you remember playing Doom. Also, once you get to a certain point, the eye candy takes a backseat to game play and story - the basic cards hit that point pretty easily now.

      Back when we used to game, you needed just about every cycle you could get to make basic gameplay what would now be considered "primitive". Middling level detail is great, in my opinion. Going up levels to the maximum detail really adds very little. I won't argue that it's cool to see that last bit of realism, but it's not worth doubling the cost of a computer to get it.

      • by Rockoon ( 1252108 ) on Sunday June 27, 2010 @09:58AM (#32708644)
        Well as far as GPU's and Gaming, there are two segments of the population: Those with "low resolution" rigs such as 1280x1024 (most common group according to steam), and those with "high resolution" rigs such as 1920x1200.

        An $80 video card enables high/ultra settings at 60+ FPS on nearly all games for the "low resolution" group, but not the "high resolution" group.
    • by bat21 ( 1467681 )
      I do, because I enjoy playing my games at 1920x1080 on high graphics settings with a decent frame rate. I think playing at low res on a crappy monitor degrades enjoyment of the game. I have a few friends that don't mind playing at 1024x768, and a couple more that still try to play the latest games with onboard video ("I can run Crysis if I set everything to low right?"), but that's more because they're cheap than because they don't care about high performance. That isn't to say I would go out and buy a $500
      • So you ARE under the age of 25! Joking aside, I find the $50-$75 3d cards to work just fine for new 3d games. This has been an adequate price-to-performance point for me since about 2003.

  • The day I build a computer with an Nvidia graphics processor as a CPU is when it's time to call 911, cause I will have completely lost my mind.
  • by werewolf1031 ( 869837 ) on Sunday June 27, 2010 @10:28AM (#32708792)
    Just kiss and make up already. Intel and nVidia have but one choice: to join forces and try collectively to compete against AMD/ATI. Anything less, and they're cutting their nose off to spite their respective faces.
  • by jedidiah ( 1196 ) on Sunday June 27, 2010 @10:44AM (#32708868) Homepage

    Yeah, speciality silicon for a small subset of problems will stomp all over a general purpose CPU. No big news there.

    Why is Intel even bothering to whine about this stuff? They sound like a bunch of babies trying to argue that the sky isn't blue.

    This makes Intel look truely sad. It's completely unecessary.

    • Re: (Score:3, Insightful)

      by chriso11 ( 254041 )

      The reason that Intel is whining is in the context of large number crunching systems or high end workstations. Rather than sell Ks of chips for the former, Nvidia (and to a lesser extent AMD) gets to sell hundreds of GPU chips. And for the workstations, Intel sells only one chip instead of a 2 to 4.

  • I remember reading here on ./ that it got abandoned by Intel.
  • Intel decided to bail on marketing an in-house high performance GPU. But, they'd still like a return on their Larrabee investment. I don't doubt they would have been pushing the HPC mode anyway, but now, that's all they've got. Unfortunately for Intel, they've got to sell Larrabee performance based on in-house testing, while there are now a number of CUDA-based applications, and HPC-related conferences and papers are now replete with performance data.

    To Intel's and AMD/ATI's advantage, NVIDIA has signed on

  • Using Badaboom a CUDA app, you can rip down DVD copies to your Ipod's in minutes, not hours.

    Unfortunately Badaboom are idiots and are taking their sweet time porting to the 465/470/480 cards.

    I'd love to see a processor fast enough to beat a GPU at tasks such as these, and cd to mp3 conversions on CUDA, it's like moving from a hard drive to a fast SSD.

  • From Wikipedia, "OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors." In other words, write your massively parallel programs using OpenCL and then run them on the device (or combination of devices) that executes your program the fastest.

    Hopefully, OpenCL will have the same cataylzing effect on HPC that OpenGL had on computer graphics, but time will tell.

    Word of warning to Intel: Almost nobody w

Understanding is always the understanding of a smaller problem in relation to a bigger problem. -- P.D. Ouspensky

Working...