Follow Slashdot blog updates by subscribing to our blog RSS feed

Next-Gen Intel Chip Brings Big Gains For Floating-Point Apps 176

Posted by timothy on Monday March 18, 2013 @04:50PM from the code-slower dept.

An anonymous reader writes "Tom's Hardware has published a lengthy article and a set of benchmarks on the new "Haswell" CPUs from Intel. It's just a performance preview, but it isn't just more of the same. While it's got the expected 10-15% faster for the same clock speed for integer applications, floating point applications are almost twice as a fast which might be important for digital imaging applications and scientific computing." The serious performance increase has a few caveats: you have to use either AVX2 or FMA3, and then only in code that takes advantage of vectorization. Floating point operations using AVX or plain old SSE3 see more modest increases in performance (in line with integer performance increases).

This discussion has been archived. No new comments can be posted.

Next-Gen Intel Chip Brings Big Gains For Floating-Point Apps

Load All Comments

Search 176 Comments Log In/Create an Account

Comments Filter:

Would that improve hashing speeds in, say, Bitcoin (Score:1)

by d33tah ( 2722297 ) writes:

Would that improve hashing speeds in, say, Bitcoin?
- Re:Would that improve hashing speeds in, say, Bitc (Score:5, Informative)
  
  by slashmydots ( 2189826 ) writes: on Monday March 18, 2013 @05:05PM (#43207459)
  
  Slightly, but you haven't been keeping up on the latest hardware? My pair of Sapphire 5830's graphics cards would top off at about 435MH/s at a total system wattage of around 520W. The new Jalapeno chips from butterfly labs will do 4500 MH/s using 2 watts total system power. For comparison, my i5-2400 performed 14MH/s at 95W or so. So the Jalapeno is about 321x faster and about 47x more power efficient so combined, I believe that's 15,267.864x more efficient.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by 0100010001010011 ( 652467 ) writes:
    
    Can the Jalapeno chips do anything else when the Bitcoin market crashes? At least with the video cards I cant still drive video cards with them.
    - Re: (Score:2)
      
      by slashmydots ( 2189826 ) writes:
      
      They had officially classified it as a coffee warmer
    - - Re: (Score:1)
        
        by viperidaenz ( 2515578 ) writes:
        
        Bitcoins still hold no value to me. No one I deal with accepts them as currency, hence they hold no value.
        I can't pay my taxes with bitcoins, I can't buy food, I can't repay my mortgage, I can't buy petrol. What can I do with a bitcoin?
        
        Re: (Score:3)
        
        by slashmydots ( 2189826 ) writes:
        
        You can sell them on the exchange quickly and easily for USD (or 5 other major currencies)
        
        Re: (Score:1)
        
        by viperidaenz ( 2515578 ) writes:
        
        So I can sell them for less than the cost of power to mine them? There's also the loss associated with amortising and depreciating the hardware required to mine them as well.
        
        Re: (Score:2)
        
        by slashmydots ( 2189826 ) writes:
        
        When bitcoins hit $3.60 ea and the difficulty was about 1/3 what it is now, I was spending $42 on electricity to get around $45 in BTC. Now the price is $47/BTC and it takes 1/250th the power to generate them 10x as fast but at 3x harder difficulty. Still a hell of a net gain.
        
        Re: (Score:2)
        
        by viperidaenz ( 2515578 ) writes:
        
        Half the bitcoins that will ever be mined have already been mined. If this was to ever be widespread, how would more than 21 million people be able to take part? That's only 0.3% of the world. Less than 10% of USA. Less than one coin per Australian (there's 22 million of those buggers)
        As soon as you start using fractions of coins, you're introducing traditional banks in to the picture. Single points of failure to what used to be a distributed system.
        Scams and fraud shouldn't be too hard either. If you hijac
        
        Re: (Score:2)
        
        by Jeremi ( 14640 ) writes:
        
        I can't pay my taxes with bitcoins, I can't buy food, I can't repay my mortgage, I can't buy petrol. What can I do with a bitcoin?
        You can send them to me...
        
        Re: (Score:2)
        
        by ultranova ( 717540 ) writes:
        
        Bitcoins still hold no value to me.
        It is interesting how every mention of Bitcoin attracts people saying how they're worthless, useless, or a scam that's about to collapse any second now. It's interesting, because people don't usually spend this much time hating something that wouldn't affect them in any way even if they were right. It's almost starting to seem like a FUD campaign, which leads to a question: who is behind it, the banks, the government, Visa or PayPal?
        
        Re: (Score:2)
        
        by viperidaenz ( 2515578 ) writes:
        
        The purple unicorns are behind it.
- Re: (Score:3, Insightful)
  
  by Anonymous Coward writes:
  
  Would that improve hashing speeds in, say, Bitcoin?
  Bitcoin is based on SHA256 hashing, which has zero floating point operations. So no, this will not impact Bitcoin mining at all.
Let's see... (Score:5, Funny)

by bluegutang ( 2814641 ) writes: on Monday March 18, 2013 @04:55PM (#43207361)

" Next-Gen Intel Chip Brings Big Gains For Floating-Point Apps "
How much of a gain? More or less than 0.00013572067699?

Share
twitter facebook
- Re: (Score:1)
  
  by kimvette ( 919543 ) writes:
  
  FTFS:
  While it's got the expected 10-15% faster for the same clock speed for integer applications, floating point applications are almost twice as a fast
  
  HTH
  - Re:Let's see... (Score:5, Informative)
    
    by 0100010001010011 ( 652467 ) writes: on Monday March 18, 2013 @05:06PM (#43207461)
    
    It's a joke. The Intel P5 Pentium FPU had a bug where
    4195835/3145727=1.333739068902037589 The correct answer is 1.333820449136241002.
    
    Parent Share
    twitter facebook
    - Re: (Score:3)
      
      by kimvette ( 919543 ) writes:
      
      Oh right, that bug an Intel rep laughably claimed one would only encounter once every 2,500 years or so. I'd forgotten about that.
  - Less rounding of floating point numbers (Score:5, Informative)
    
    by raymorris ( 2726007 ) writes: on Monday March 18, 2013 @05:16PM (#43207539) Journal
    
    While it's got the expected 10-15% faster for the same clock speed for integer applications, floating point applications are almost twice as a fast HTH
    Integer and floating point are separately implemented in the hardware, so an improvement to one often doesn't apply to the other. You can add integers by counting on your fingers. To do that with floating point, you have to cut your fingers into fractions of fingers - a very different process.
    See: http://en.wikipedia.org/wiki/FMA3 [wikipedia.org]
    It's common to have an accumulator like this:
    X = X + (Y * Z)
    To compute that in floating points, the processor normally does:
    A= ROUND(Y*Z) X=ROUND(X+A)
    Each ROUND() is necessary because the processor only has 64 bits in which to store the endless digits after the decimal point. FMA can fuse the multiply and the add, getting rid of one rounding step, and the intermediate variable:
    X= ROUND( X + (Y*Z) )
    That makes it faster. Since integers don't get rounded to the available precision, the optimization doesn't apply to integers. The above processor would do Y*Z, then +X, then round, then X=. A CPU designer can make that faster by including either a "add and multiply" circuit or a "add and round" circuit or a "round and assign' circuit. Any set of operations can be done in two clock cycles, if the maker decides to include a hardware circuit for it.
    
    Parent Share
    twitter facebook
    - - Prove that and you'll be more famous than Turing (Score:2)
        
        by raymorris ( 2726007 ) writes:
        
        If you can do that, you'll revolutionize computing. No, doubling the clock to send two ticks to the gates doesn't count - the real clock is defined by the gate speed.
        
        Re: (Score:1)
        
        by Anonymous Coward writes:
        
        Who are you to define what counts and what doesn't?
- Re: (Score:2)
  
  by unixisc ( 2429386 ) writes:
  
  Okay, so how will it compare w/ the Itanium?
Hope it's going in the new Mac Pro (Score:4, Interesting)

by GlobalEcho ( 26240 ) writes: on Monday March 18, 2013 @05:02PM (#43207431)

I hope there's really a new Mac Pro coming [ibtimes.com] and that it has these chips in it! I do a heck of a lot of PDE solving, statistics and simulations, and would love to have a screamin' machine again.

Share
twitter facebook
- Re:Hope it's going in the new Mac Pro (Score:5, Insightful)
  
  by Anonymous Coward writes: on Monday March 18, 2013 @05:14PM (#43207527)
  
  Do you really need a Mac for that? If not, it seems you're limiting your potential by having to wait for the holy artifacts to be released.
  
  Parent Share
  twitter facebook
  - - Re: (Score:1)
      
      by Anonymous Coward writes:
      
      Thank you for that imagery :D
    - Re: (Score:1)
      
      by Anonymous Coward writes:
      
      He buys the special edition that comes with a dildo
      Oh, I get it! Because Mac owners are homosexuals...that's funny! Stupid homosexuals. Mod parent up!
- Re:Hope it's going in the new Mac Pro (Score:5, Interesting)
  
  by semi-extrinsic ( 1997002 ) writes: <asmunderNO@SPAMstud.ntnu.no> on Monday March 18, 2013 @05:18PM (#43207555)
  
  If you're doing numerics, what the fuck (if you'll pardon my French) are you doing buying Apple? I'm working on two-phase Navier-Stokes solvers myself, and I just bought a new rig consisting of 3 boxes each with a Intel Core i7 @ 3.7 GHz, 12 GB RAM, an SSD drive and a big-ass cooling system. In total that cost less than the Mac Pro with a single Core i7 @ 3.3 GHz listed in that article.You're paying 3x more than you should, and you get what extra? A shiny case? Puh-lease.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by Charliemopps ( 1157495 ) writes:
    
    He gets to tell his friends he bought an apple... apparently he keeps friends that care.
  - Re: (Score:1)
    
    by Anonymous Coward writes:
    
    Most physics researchers (source: physics PhD) use Mac desktops/laptops and Linux servers. Macs are perfect environments for a mix of coding and general computing, with good support for *nix tools. Anything serious gets done on a cluster. I've seen this in several universities, all of them top tier (e.g. Oxford, Imperial, UCL, Warwick), so it's not isolated.
    But hey, this is Slashdot.
    - Re: (Score:2)
      
      by IWannaBeAnAC ( 653701 ) writes:
      
      Most of the people in the physics department here use windows desktops, but pretty much all of the numerics people use linux desktops. Naturally, all of the computing clusters are linux. It seems that virtually all laptops are macs though, which is curious. Possibly people would like to use macs on the desktop but there is some barrier (eg, purchasing or IT administration policies) ? I'll have to find out!
      - Re: (Score:2)
        
        by IWannaBeAnAC ( 653701 ) writes:
        
        >virtually all laptops are macs though, which is curious.
        Not really. The laptops really are great hardware regardless of which OS you run on it. Unless of course you are one of those people with an irrational hatred of all things Apple.
        I was referring to the dichotomy of using windows on the desktop but a mac laptop.
    - Re: (Score:3)
      
      by LordLimecat ( 1103839 ) writes:
      
      Youre paying at least double for the same hardware on a Mac. The Mac cited in the article has 2x 6-core Xeons @ 2.4gHz. Those (assuming E5645s) can be had for ~$575 each, with a motherboard at ~$275. Everything else is pocket change; a whole right with SSDs etc could be had for under $1700.
      But Im sure someone somewhere will explain why the aluminum makes the extra $2000 for the Mac worth it.
      - Re: (Score:2)
        
        by Jeremi ( 14640 ) writes:
        
        But Im sure someone somewhere will explain why the aluminum makes the extra $2000 for the Mac worth it.
        The case is very nice, but it's not worth $2000 extra.
        The ability to run MacOS/X (without "hackintosh" style shenanigans) is really nice, and is worth $2000 extra if you have that kind of money lying around (or, more realistically, if your employer does).
        If you think $2000 extra is too much to spend, you're probably right. On the other hand, plenty of people will spend an extra $20,000 on a nicer brand of car; sometimes people want what they want, and are willing to pay extra for it.
        
        Re: (Score:2)
        
        by drinkypoo ( 153816 ) writes:
        
        On the other hand, plenty of people will spend an extra $20,000 on a nicer brand of car; sometimes people want what they want, and are willing to pay extra for it.
        The problem with this notion is that often the people are not buying a nicer brand of car, they're buying a prettier brand of car. A Lexus is just a Toyota with more asphalt and the same shit construction and the same shit handling. But a BMW costs the same as a Lexus and is, well, they're built shit since the eighties, but they're actually worth driving. For their extra $2000 they could have got something substantively better, but all they've done is buy a shinier Toyota with some options they could have a
        
        Re: (Score:2)
        
        by LordLimecat ( 1103839 ) writes:
        
        The ability to run MacOS/X (without "hackintosh" style shenanigans) is really nice, and is worth $2000 extra if you have that kind of money lying around
        Which doesnt explain why a lower end Mac costs only $1000. And whether its worth $2000 extra is about as subjective as it gets; particularly when I doubt you can name a capability that OSX has that Windows does not, or a benchmark showing a substantial performance difference.
        Why not just a debian or RH flavor and be done with it if you really want a *nix?
        
        Re: (Score:2)
        
        by Jeremi ( 14640 ) writes:
        
        particularly when I doubt you can name a capability that OSX has that Windows does not
        Built-in bash shell and Unix environment by default is what does it for me. (I know you can sort of fake it using Cygwin and whatnot on a Windows box, but I'd rather pay the extra money and not have to fake it). I was a die-hard BeOS user back in the day, and MacOS/X is the closest thing to the BeOS user experience that is readily available now.
        Why not just a debian or RH flavor and be done with it if you really want a *nix?
        Because I also want to be able to buy and use commercial software. Linux/Unix are fine, but it's also nice to be able to get software X you rather than having to
    - - Re: (Score:2)
        
        by newcastlejon ( 1483695 ) writes:
        
        If your experimental labs are anything like our workshops you'll probably find them running a few ancient Win95/DOS tools that don't take kindly to being cooped up in a VM without direct access to hardware. As I think back, though, I do recall a lonely old G3 being used as a data logger.
  - Re: (Score:2)
    
    by fyngyrz ( 762201 ) writes:
    
    Not to put too fine a point on it, he gets OSX, the OSX ecosystem, the vast majority of the *nix ecosystem, the ability to VM several varieties of the Windows ecosystem *or* any one of a number of pure *nix ecosystems, all in parallel if he likes, the ability to drive a bunch of monitors (I've got six on mine), all manner of connectivity, and yes, perhaps last and even perhaps least, probably one of the best cases out there -- it's not just shiny. it's bloody awesome.
    I don't even *like* Apple the company --
    - - Re: (Score:2)
        
        by fyngyrz ( 762201 ) writes:
        
        Other than OSX and the higher price tag; what was the point of the rest of your comment?
        
        The point was, and is, that he's happy with his Mac. I'm sorry you don't get it.
        Don't other PCs provide you access to nix and windows VMs?
        
        They don't, however, provide you with access to OSX. It's the combination of all of them, all working at once, that really brings the whupass. And you won't be doing that in any legit, supported fashion on anything but a Mac. That's well worth the candle. See, this is part of the "an
        
        Re: (Score:2)
        
        by unixisc ( 2429386 ) writes:
        
        Aside from the salient point about him being happy w/ the Mac, the GP's other point - somehow missed - was that OS-X brings with it FBSD userland, which therefore makes available most if not all of Unix features. If he had a Wintel PC, he'd have had to run a Linux or a VirtualBSD VM, and if he had Linux, there would be a paucity of applications for it. Here, since it's OS-X/FBSD, it's very unlikely that he'll need Linux, except maybe to run any specialized program developed only for Linux. But if he does
  - Re: (Score:2)
    
    by GlobalEcho ( 26240 ) writes:
    
    If you're doing numerics, what the fuck (if you'll pardon my French) are you doing buying Apple?
    Fair question. It turns out, PDE solving etc. isn't all I do, so while I like my machine to be reasonably fast at the numerics, I require it to work well as a general-purpose computer, too. To me, Windows, Linux and FreeBSD fail to meet that criterion.
    I do small-to-medium problems locally without having to think about remote execution issues, and then farm truly heavy numerics out to parallel processing farms like anybody else (aside from the PDE solvers, much of what I do is embarrassingly parallel). It
    - Re: (Score:2)
      
      by cpotoso ( 606303 ) writes:
      
      You would still do a lot better getting an imac for your regular software and a linux machine for the computation. X11 makes all transparent too. And still spend less... See my post above.
      - Re: (Score:2)
        
        by GlobalEcho ( 26240 ) writes:
        
        I agree with you from a price point of view, but workflow efficiency is very important to me, moreso than workstation power.
        At one of my jobs, a powerful Linux workstation is my primary machine and we use a Linux compute farm, so I am keenly aware of the shortcomings of both the Linux user environment and of the hassle involved in dealing with remote jobs. If one doesn't have a very wide variety of calculations, or the calculations rarely change, then remote is no big deal. Otherwise it is a real time sin
  - - Re:Hope it's going in the new Mac Pro (Score:4, Insightful)
      
      by Aardpig ( 622459 ) writes: on Monday March 18, 2013 @07:03PM (#43208621)
      
      Erm -- ECC memory is slower than non-ECC memory, I think.
      
      Parent Share
      twitter facebook
      - Re:Hope it's going in the new Mac Pro (Score:5, Informative)
        
        by KonoWatakushi ( 910213 ) writes: on Monday March 18, 2013 @07:59PM (#43209131)
        
        ECC memory is only marginally slower. Considering error rates and modern memory sizes, it is far past time that it became a standard feature. The extra cost would be totally insignificant if were standard, and not used as an excuse to gouge people on Xeons.
        
        Parent Share
        twitter facebook
        
        Re: (Score:2)
        
        by fa2k ( 881632 ) writes:
        
        Actual error rates in good memory are very low. I didn't see a single error for a year. The main benefit of ECC on workstations is to detect memory that is slightly bad, which passes hours of memtest86, but still gives you errors maybe every month on your workload. This requires you to monitor ECC errors, and get alerts, so you can replace the DIMMs that give errors repeatedly. The problem is that ECC monitoring for the new Intel chips is not available in Linux (as far as I can tell, cheeky plug for my stac
      - Re: (Score:2)
        
        by TheRaven64 ( 641858 ) writes:
        
        You can make RAM even faster - returning the result in a single cycle every time - if you don't care whether the result it returns is the correct one...
    - Re:Hope it's going in the new Mac Pro (Score:5, Informative)
      
      by washu_k ( 1628007 ) writes: on Monday March 18, 2013 @07:29PM (#43208879)
      
      The Core i7's are consumer-grade processors and are slower than the Xeon's the Mac Pros use
      This is completely incorrect. The current Mac Pros use Nehalem based Xeons which are two generations back from the current Ivy Bridge i7s. Xeons may have differences in core count, cache and/or ECC support but their execution units are the same as their desktop equivalents. The base Mac Pro CPU is equivalent to an i7-960 with ECC support. The current Ivy Bridge i7s are a fair bit faster.
      
      Parent Share
      twitter facebook
    - Re: (Score:2)
      
      by viperidaenz ( 2515578 ) writes:
      
      The current top of the line Mac Pro has a pair of 3 year old CPUs (2x 6 core E5645/50/75, released Q1, 2010). You can't compare to any current HP or Dell etc as they use newer generation Xeons.
      A 12-core MacPro in NZ costs $6100.
      A top of the line 12-core Dell costs $6200.
      Dell has E5-2630 CPU's, Mac has E5645. Dell wins there, more cache, newer CPU.
      Dell has 16GB ECC Ram, Mac as 12GB. Dell wins there, 1600mhz, 128gb max, Mac is 1333Mhz, 64gb max.
      I'm sure a 2 year old Dell is cheaper than a brand new Mac Pro.
      Of
    - Re: (Score:3)
      
      by epyT-R ( 613989 ) writes:
      
      It depends. Depending on the generation of xeon, you pay for the privilege of some combination of ECC RAM/cache, more cache, and multisocket capability. In many cases (like the pentium 4 era), you got a p4 with more cache that wasn't much faster than the desktop variant, even with 'enterprise' loads like databases! In the pentium 3 xeon days, you got marginal benefits with the extra cache, yet paid A LOT more for the hardware. With Xeon, the performance boost rarely justified the cost. Intel knew this,
    - Re: (Score:2)
      
      by semi-extrinsic ( 1997002 ) writes:
      
      Ehm, I have to say that ECC is over-hyped by server hardware vendors, especially for CFD applications. The failure rate for modern RAM is 1 bit error per 1 GB per 1 month of simulation. To be honest, a typical CFD code will have to handle much worse errors than that due to random programming bugs (if you think your 35,000 lines-of-code program is bug free, well... it's not) etc., such that if it crashes or becomes unphysical from 12 bit errors in a month, you're screwed anyway.
      
      On the other hand, if you'r
    - - You Obviously Never Used Sun Servers W/O ECC (Score:3)
        
        by raftpeople ( 844215 ) writes:
        
        In the early 2000's we had some, every week one of them would crash. All the other servers w/ECC, no crash. Hardly a marketing gimmick.
  - - Re: (Score:2)
      
      by semi-extrinsic ( 1997002 ) writes:
      
      Okay, I'll answer this. From my use of a Core i7, you can tell that I'll be coding a serial app with some OpenMP in the slow parts to utilize all 4 cores. Now that's much easier than writing the same code for a GPU. If I were serious in developing the "blazing fast" stuff (which I'm not, my focus is on implementing new multi-physics models) I could spend the same amount of effort and target MPI instead of GPUs, and then go run it on the 22,000 core cluster in the basement next door.
    - Re: (Score:2)
      
      by TheTurtlesMoves ( 1442727 ) writes:
      
      Not everything maps to GPUs all that well. Some fluid stuff would be rather hard work to get to work fast on GPUs, say for example 2 phase flows. Also mapping stuff to a GPU means its often quite difficult to keep it flexible which is often needed for R&D fluid codes.
      
      Its not just about FLOPs its also if you can use em, and without spending 2 years optimizing the code to do so.
  - - Re: (Score:2)
      
      by semi-extrinsic ( 1997002 ) writes:
      
      What are you going to do with 64 GB of RAM on a single node? If you're actually using more than 2-4 GB per core, your program is fully limited by RAM speed, and you should REALLY be using MPI and more nodes with less ram each.
- Re:Hope it's going in the new Mac Pro (Score:4, Interesting)
  
  by spire3661 ( 1038968 ) writes: on Monday March 18, 2013 @05:20PM (#43207567) Journal
  
  Why not just do that on real workstation hardware and tap into it remotely?
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by unixisc ( 2429386 ) writes:
    
    Why not just do that on real workstation hardware and tap into it remotely?
    What 'real workstation' is left? The only workstations available these days are x64 workstations. SPARC, POWER, MIPS and even Itanium workstations are dead. Where exactly could one buy a RISC workstation anymore, if one wanted to get it, get the latest and greatest version of Debian or *BSD and run w/ it? Everything is now Intel/AMD, and all the CPUs that had superior floating point are either dead, or exclusive to servers that would cost millions.
  - - Re: (Score:2)
      
      by alen ( 225700 ) writes:
      
      Since it sells with 2 year old cpu's
      Or was it 2 generation old cpu's
      - Re: (Score:2)
        
        by petermgreen ( 876956 ) writes:
        
        The mac pros currently ship with westmere based CPUs. The most recent comparable CPUs are sandy bridge based. So even if you count both new core designs and die shrinks as "generations" it's still only one generation behind comparable CPUs.
- Re: (Score:2)
  
  by mozumder ( 178398 ) writes:
  
  The Mac Pros use Xeon chips, which are usually updated about 1 year after the mainstream Core processors are out.
- Re: (Score:2)
  
  by cpotoso ( 606303 ) writes:
  
  ???? Why do you need a mac for that? I run mac laptops and even imacs. Even have a mac pro from 2006 (at that time a good deal, 8 xeon 3GHz, not much more expensive than the equivalent Dell). Last month, a Dell Precision workstation with 2 hex core xeons (+ hyperthreading, making them effectively 24 cores--don't scream at me, I have benchmarked MY programs and for all practical purposes it acts as 24 CPUs) for just over $2k (including 32 GB ram, 3 TB disk). Runs linux nicely and the parallelism beats a
Might be important, but probably not... (Score:5, Interesting)

by MasseKid ( 1294554 ) writes: on Monday March 18, 2013 @05:04PM (#43207449)

For problems where you need floating point AND is not multithread friendly AND need large computing power AND is specially coded, then this will be of great use. However, most massive computing problems like this are multi-thread friendly and this will still be roughly an order of magnitude from the speeds you can get by using a GPU.

Share
twitter facebook
- Re:Might be important, but probably not... (Score:4, Insightful)
  
  by semi-extrinsic ( 1997002 ) writes: <asmunderNO@SPAMstud.ntnu.no> on Monday March 18, 2013 @05:20PM (#43207577)
  
  The good thing about manufacturers speeding up SSE/AVX/etc. is that the linear algebra libraries (specifically the ATLAS implementation of BLAS and LAPACK) usually release code that makes use of the new hawtness in about six months after release. Do you know how much software relies on BLAS and LAPACK for speed?
  
  Parent Share
  twitter facebook
  - Also (Score:2)
    
    by Sycraft-fu ( 314770 ) writes:
    
    Intel's C/C++ and FORTRAN compilers are exceedingly efficient at vectorization, and are of course updated to use their new instructions. Does take a bit for software to be compiled using it, but you can see some real gains in a lot of things without special work.
    I also think people who do GPGPU get a little over focused on it and think it is the solution to all problems. You find that some things like, say, graphics rendering, are extremely fast on the stream processors that make up a modern GPU. However yo
    - Re: (Score:1)
      
      by Anonymous Coward writes:
      
      The downside of using Intel's compiler is that it will revert to using 80286 instruction set if you happen to run the code on an AMD chip.
- Re: (Score:2)
  
  by godrik ( 1287354 ) writes:
  
  Intel Xeon Phi relies on avx (version 1 I believe) and using avx gets you good improvement compared to not using avx for both sequential and parallel codes. Of course, course sequential code on Xeon Phi is typically slower than a regular sandy bridge processor.
  Many applications can use 16 float operations simultaneously. Certainly many video codecs and physics engine.
  GPUs can be good for many computations but tehre are many case where they are not so good. Most pointer chasing type of application tend not t
  - Re: (Score:2)
    
    by godrik ( 1287354 ) writes:
    
    replying to self. Xeon Phi uses larger lanes than AVX. It is 512 bits in Xeon Phi and 256 in AVX, I got the names mixed up.
  - Re: (Score:3)
    
    by Bengie ( 1121981 ) writes:
    
    http://software.intel.com/en-us/articles/intel-xeon-phi-coprocessor-codename-knights-corner [intel.com]
    An important component of the Intel Xeon Phi coprocessor’s core is its vector processing unit (VPU), shown in Figure 5. The VPU features a novel 512-bit SIMD instruction set, officially known as Intel® Initial Many Core Instructions (Intel® IMCI). Thus, the VPU can execute 16 single-precision (SP) or 8 double-precision (DP) operations per cycle. The VPU also supports Fused Multiply-Add (FMA) instructions and hence can execute 32 SP or 16 DP floating point operations per cycle. It also provides support for integers.
    - Re: (Score:2)
      
      by godrik ( 1287354 ) writes:
      
      My bad, I realize later that AVX was the new instruction set for sandy bridge and not for xeon phi. AVX (version whatever) and IMCI instructions are quite similar (gather/scatter, Fused Multiply Add, swizzling/permute). Their main different is the SIMD width.
      My overall point remains valid. Doing floating point arithmetic by packs of 256 bits is overall useful.
  - Re: (Score:2)
    
    by TeXMaster ( 593524 ) writes:
    
    OpenCL is suboptimal on NVIDIA only because NVIDIA refues to keep their support up to date, as it would chip in their vendor lock-in attempt with CUDA.
    I honestly think everybody doing serious manycore computing should use OpenCL. NVIDIA underperforms with that? Their problem. Ditch them.
- Re: (Score:2)
  
  by Bengie ( 1121981 ) writes:
  
  Not all multi-threaded code is large matrix friendly and GPUs need large matrix math to become useful.
- Re: (Score:2)
  
  by pclminion ( 145572 ) writes:
  
  Yeah, pretty much. Basically, they just doubled the width of the vector execution units. Obviously, that will double the FLOPS for vectorized code. In other news, 8 cores can do twice the work of 4 cores, if your code is multithreaded properly.
- - Re: (Score:2)
    
    by GlobalEcho ( 26240 ) writes:
    
    That's one of the nice things about OpenCL. I wish they would come up with more (and better) math libraries.
    - Re: (Score:2)
      
      by Aardpig ( 622459 ) writes:
      
      I wish NVIDIA would update their drivers to support OpenCL 1.1. Oh wait, that's not going to happen because they are trying to push CUDA instead...
      - Re: (Score:2)
        
        by drinkypoo ( 153816 ) writes:
        
        It will happen if AMD ever manages to make drivers reliable enough that significant numbers of people buy significant numbers of their cards, and nVidia has actual competition.
Nearing complete integration (Score:1)

by bstrobl ( 1805978 ) writes:

The thing that interests me most about this generation is the progress towards a single chip solution. Ultrabooks and tablets can get a multi chip package with the PCH (last remnant of the old chipset) soldered along the CPU/GPU die. Shouldn't take long till everything is fabbed onto one piece of silicon, reducing power requirements and gadget size.
wtf? fma3? (Score:2, Offtopic)

by convolvatron ( 176505 ) writes:

could someone tell me how many separate instruction sets, pipelines and register files I
get in a mainline CPU these days? i turned away for a second and completely lost track.
what happens with the 10 that you aren't using? just sitting there reducing the yield?
128 bit floats: when? (Score:2)

by rmstar ( 114746 ) writes:

While speed for single and double floats is all well and good, I wonder - when will there finally be hardware support for 128 bit (quadruple precission) floats? [wikipedia.org]
- Re: (Score:2)
  
  by godrik ( 1287354 ) writes:
  
  What is the use for them? for "personal" use, floats are all you will ever need. Many physics computation stays in single precision to avoid doubling the memory usage. I guess fluid mecanic computation use double, but is there really a use for quad. Who needs that kind of precisions?
  - Re: (Score:2)
    
    by ChrisMaple ( 607946 ) writes:
    
    Three years ago I was doing a SPICE simulation (SPICE uses doubles) for a radio receiver. The simulation ran into digital noise before the receiver would have, and it essentially ruined the critical part of the simulation. Software 128 bit floats is unacceptably slow.
  - Re: (Score:3)
    
    by Jeremy Erwin ( 2054 ) writes:
    
    here's an old paper describing octuple precision on the PowerPC G4 [apple.com]
    Many problems in number theory and the computational and physical sciences, espe- cially in recent times, require more floating point precision than is commonly available in fundamental computer hardware. For example, the new science of “experimental mathematics,” whereby algebraic truths are foreshadowed, even discovered numerically, requires much more than single (32-bit) or double (64-bit) precision.
    That paper references Bailey's 2000 paper on Quad double algorithms [lbl.gov], which alludes to "pure mathematics, study of mathematical constants, cryptography, and computational geometry
  - Re: (Score:2)
    
    by CSMoran ( 1577071 ) writes:
    
    What is the use for them? for "personal" use, floats are all you will ever need. Many physics computation stays in single precision to avoid doubling the memory usage. I guess fluid mecanic computation use double, but is there really a use for quad. Who needs that kind of precisions?
    Not all uses are personal and the fact that some physics calculations trade precision for memory doesn't mean that all of them do.
    
    One example could be matrix inversions with somewhat ill-conditioned matrices. When you know you're going to lose 14 digits of precision inverting the matrix, you'd better have a lot of headroom. Cue quad floats.
    
    The car analogy that comes to mind is people often do sound mixing with 32-bit audio even though you 16-bit audio is perfectly fine for listening to the product.
- Re: (Score:2)
  
  by Twinbee ( 767046 ) writes:
  
  I would have hoped more bits were given to the exponent in quad precision. It's given 15 bits compared to double precision's 11.
  
  So many bits, and it almost all goes to the fraction - a real shame.
  - - Re: (Score:3)
      
      by Twinbee ( 767046 ) writes:
      
      It would prevent the need to some extra math for extra high numbers (not just those that end on a high numbers, but where the intermediate calculation may be high (e.g.: factorial math to find out the probability of something if I recall). Plus, 96 bits is more than enough for the fraction if you ask me - very greedy in fact to take that to 112 at the cost of 16 bits the exponent could well do with.
- Re: (Score:3)
  
  by gnasher719 ( 869701 ) writes:
  
  While speed for single and double floats is all well and good, I wonder - when will there finally be hardware support for 128 bit (quadruple precission) floats?
  It was there on PowerPC for many years, and with Haswell it will be there for x86 as well. FMA is all you need for efficient 128 bit arithmetic.
Confused? (Score:2)

by Narishma ( 822073 ) writes:

The serious [floating point] performance increase has a few caveats: you have to use either AVX2 or FMA3,
Isn't AVX2 just the integer version of AVX? Like SSE2 added integer versions of the SSE floating point instructions? If so, that sentence doesn't make sense.
- Re: (Score:2)
  
  by godrik ( 1287354 ) writes:
  
  No, there is more to it:
  * Expansion of most integer AVX instructions to 256 bits
  * 3-operand general-purpose bit manipulation and multiply
  * Gather support, enabling vector elements to be loaded from non-contiguous memory locations
  * DWORD- and QWORD-granularity any-to-any permutes
  * Vector shifts
  * 3-operand fused multiply-accumulate suppor
ERROR (Score:1)

by xlokix ( 2869115 ) writes:

"As you see in the red bar, the task is finished much faster on Haswell. It’s close, but not quite 2x." Sorry to ruin it for everyone but the RED bar is integer not floating point.
FMA4 (Score:4, Informative)

by ssam ( 2723487 ) writes: on Monday March 18, 2013 @06:31PM (#43208271)

Pah. AMD had FMA4 since 2011

Share
twitter facebook
The new min spec (Score:1)

by Billly Gates ( 198444 ) writes:

To get Cyrsis 3 at 30 fps is here!
lies and bullshit (Score:2)

by decora ( 1710862 ) writes:

"hey kids, our CPU is twice as fast as the next guys!"*
*(you must rewrite your code to do twice as much stuff at once)
**(which has been true for like, 15 years ever since SSE + friends made it into the PC market)
***(which means developers have to spend time writing non-portable optimization code)
GT3 (Score:3, Interesting)

by edxwelch ( 600979 ) writes: on Monday March 18, 2013 @07:00PM (#43208593)

AMD has lost the CPU race a long time ago, but still beats Intel with integrated graphics. Now, It looks like Haswell could win that battle too.
The article shows GT2 to be 15% - 50% faster than the old HD4000. That's still a bit slower than Trinity, but GT3 has double the execution units than GT2, potentially blowing anything away that AMD could offer.

Share
twitter facebook
bs hype is what this is (Score:1)

by Anonymous Coward writes:

when avx came out, it was supposed to be a major speedup..
guess what, lots of things are still faster in SSE2/3
many of the new registers appear to speed things up, but what isn't readily apparent is there haven't always been improvements in memory ports.
the major speedups are going to come from cleaning up the way instructions are handled and the memory lanes in the chip, not just throwing more registers at us
This guy (Agner Fog) is the best reference on the net for what's going on in these chips:
http://www
Compilers (Score:2)

by fa2k ( 881632 ) writes:

Will gcc use AVX or FMA3 if I write normal code in C++? How about Java and Python / numpy, could it be that python actually gets faster than C++ if gcc doesn't take advantage of these technologies?
- Re: (Score:2)
  
  by Aardpig ( 622459 ) writes:
  
  No overclocking? Ermahgerd, that's a showstopper for those wanting to do HPC!!!!!!!!!!!!!!!!!!
- Re: (Score:2)
  
  by ChrisMaple ( 607946 ) writes:
  
  Reading your diatribe would lead the naive reader to believe Intel's processors' benchmarks are substantially inferior to AMD's. Now that's comedy.
- Re:Poor AMD (Score:5, Insightful)
  
  by dshk ( 838175 ) writes: on Monday March 18, 2013 @08:06PM (#43209197)
  
  AMD already has FMA3. They also published great results. Of course nobody read it, at least I have seen mentioned it in the usual generic benchmark articles people like to refer (which does not use FMA3).
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by dshk ( 838175 ) writes:
    
    I mean "...I have never seen mentioned..."
    - Re: (Score:1)
      
      by Anonymous Coward writes:
      
      I thought AMD used FMA4, based on one article I read. But then again, I barely understand what FMA stands for, let alone 4 vs 3 and which one is better.
- Re: (Score:2)
  
  by swilde23 ( 874551 ) writes:
  
  Regular users will see the regular increase (roughly the same as the integer increase).
  But, anytime a chip releases a new feature that relies on specific code, of course only "certain kinds of apps" will get a boost.
  Or maybe I'm misreading the summary (because, I don't read articles)
- Re: (Score:2)
  
  by 7-Vodka ( 195504 ) writes:
  
  Unless you can just recompile your OS and all your software with a new version of GCC...

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Would that improve hashing speeds in, say, Bitcoin (Score:1)

Re:Would that improve hashing speeds in, say, Bitc (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:3)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Insightful)

Let's see... (Score:5, Funny)

Re: (Score:1)

Re:Let's see... (Score:5, Informative)

Re: (Score:3)

Less rounding of floating point numbers (Score:5, Informative)

Prove that and you'll be more famous than Turing (Score:2)

Re: (Score:1)

Re: (Score:2)

Hope it's going in the new Mac Pro (Score:4, Interesting)

Re:Hope it's going in the new Mac Pro (Score:5, Insightful)

Re: (Score:1)

Re: (Score:1)

Re:Hope it's going in the new Mac Pro (Score:5, Interesting)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Hope it's going in the new Mac Pro (Score:4, Insightful)

Re:Hope it's going in the new Mac Pro (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re:Hope it's going in the new Mac Pro (Score:5, Informative)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

You Obviously Never Used Sun Servers W/O ECC (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Hope it's going in the new Mac Pro (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Might be important, but probably not... (Score:5, Interesting)

Re:Might be important, but probably not... (Score:4, Insightful)

Also (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Nearing complete integration (Score:1)

wtf? fma3? (Score:2, Offtopic)

128 bit floats: when? (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)