Intel Shows 14nm Broadwell Consuming 30% Less Power Than 22nm Haswell 88
MojoKid writes "Kirk Skaugen, Senior Vice President and General Manager of the PC Client Group at Intel, while on stage, at IDF this week snuck in some additional information about Broadwell, the 14nm follow up to Haswell that was mentioned during Brian Krzanich's opening day keynote. In a quick demo, Kirk showed a couple of systems running the Cinebench multi-threaded benchmark side-by-side. One of the systems featured a Haswell-Y processor, the other a Broadwell-Y. The benchmark results weren't revealed, but during the Cinebench run, power was being monitored on both systems and it showed the Broadwell-Y rig consuming roughly 30% less power than Haswell-Y and running fully loaded at under 5 watts. Without knowing clocks and performance levels, we can't draw many conclusion from the power numbers shown, but they do hint at Broadwell-Y's relative health, even at this early stage of the game."
Re:ARM vs x86 (Score:5, Insightful)
Ya I think ARM fanboys need to step back and have a glass of perspective and soda. There seems to be this article of faith among the ARM fan community that ARM chips are faster per watt, dollar, whatever than Intel chips by a big amount. Also that ARM could, if they wish, just scale their chips up and make laptop/desktop chips that would annihilate Intel price/performance wise. However for some strange reason, ARM just doesn't do that.
The real reason is, of course, it isn't true. ARM makes excellent very low power chips. They are great when you need something for a phone, or an integrated controller (Samsung SSDs use an ARM chip to control themselves) and so on. However that doesn't mean they have some magic juju that Intel doesn't, nor does it mean they'll scale without adding power consumption.
In particular you can't just throw cores at things. Not all tasks are easy to split down and make parallel. You already find with with 4/6 core chips on desktops. Some things scale great and use 100% of your CPU (video encoding for example). Others can use all the cores, but only to a degree. You see some games like this. They'll use one core to capacity, another near to it, and the 3rd and 4th only partially. Still other things make little to no use of the other cores.
So ARM can't go and just whack together a 100 core chip and call it a desktop processor and expect it to be useful.
Really, Intel is quite good at what they do and their chips actually are pretty efficient in the sector they are in. A 5-10 watt laptop/ultrabook chip does use a lot more than an ARM chip in a smartphone, but it also does more.
Also Intel DOES have some magic juju ARM doesn't, namely that they are a node ahead. You might notice that other companies are talking about 22/20nm stuff. They are getting it ready to go, demonstrating prototypes, etc. Intel however has been shipping 22nm stuff, in large volume, since April of last year. They are now getting ready for 14nm. Not ready as in far off talking about, they are putting the finishing touches on the 14nm fab in Chandler, they have prototype chips actually out and testing, they are getting ready to finalize things and start ramping up volume production.
Intel spends billions and billions a year on R&D, including fab R&D, and thus has been a node ahead of everyone else for quite some time. That alone gives them an advantage. Even if all other things are equal, they've smaller gates, which gives them lower power consumption.
None of this is to say ARM is bad, they are very good at what they do as their sales in the phone market shows. But ARM fans need to stop pretending they are some sleeping behemoth that could crush Intel if only they felt like it. No, actually, Intel's stuff is pretty damn impressive.
Re: 30%? (Score:5, Insightful)
Parent is correct.
Power usage goes up with *square* of voltage, but is *linear* with clock speed.
Frequency does not matter much, voltage does.
Re:Yawn (Score:2, Insightful)
The IPC has hit a brick wall. The proportion of time spent on cache misses and branch mispredictions simply is a limit.
After all IBM Power8 will have 8 threads/core (as announced at Hot Chips, but as far as I know, there have been no article about it on Slashdot). I'm not sure 8 is very useful, but apparently on many workloads, the 4 threads/core of Power7/Power7+ gives more throughput then 2 threads. Several threads per core increase aggregate IPC, but not per thread IPC.
The reason I'm doubtful on 8 threads/core is important on Power8 is that there are only 2 LSU (load/store units), which means that each thread can only access memory every 4 cycles on average. For a RISC processor with 32 registers and 2 LSU, 2 threads are an obvious way to keep the execution units busy, 4 threads can certainly increase throughtput, but at 8 I start to my doubts since they have to share the caches despite the 4(!) levels of cache: L1 and L2 are per core, L3 is mostly per core if I understand correctly, and L4 is completely shared among all cores. Fox x86, Intel has not gone (yet) above 2 threads, and they seem to go rather towards large and wide register file with AVX512, introducing a new instruction encoding which is even more baroque (tough job, but it's Intel we are speaking of) than everything they have piled on top of the original 8086, including a new 4 byte long prefix for a start. I'm also a bit doubtful about AVX512: the instruction to save/restore FPU context has been extended and now stores/loads over 2.5kB of data, this is more context than any other processor I know (including Itanic) and will certainly impact context switch and signal delivery latencies. It is also incredibly intertwined with the memory protection extensions (MPX) and adds complexities to supporting MPX.
On x86, last time I looked, there were still only one load and one store unit (somewhat less flexible than 2 general purpose LSU since there are generally more loads than stores) but the big problem was with 32 bit code, which was spilling like mad because of two few available registers. amd64 (whatever claims Intel, they had to follow AMD's on this one, and the NIH syndrome shows) often gives 9 more available registers (in theory 8, but position independent code has to sacrifice a register on 32 bit) has made the memory traffic due to register spills negligible in practice. Intel stayed far too long minimizing importance of 64 bit support
(NIH syndrome?). However, they finally have admitted that 64 bit is mainstream and 32 bit fading away.
Trying to improve the IPC on x86 is a nightmare, because the instruction decoder is insanely complex (and the complexity is growing): they have gone from 3 instruction/clock on the PPro to 4 (Only 33% improvement in 17 years, I'm aware that this is a meaningless figure). I don't remember how instruction decoding is done on Intel processors, but I remember a description of an AMD processor in which they simply distribute the instruction stream to 16 decored, each shifted by one byte, and then
cancel the results of the decoders who were not fed a instruction which starts on an instruction boundary. That's gross, needs a lot of wasted power and transistors for no useful work. Actually some Intel processors have dual instruction caches: one encoded in x86, and one recoded to something easier to digest. Of course, this does not come for free (both in silicon area and in power consumption, coherency logic, etc.).