Slashdot Log In
Inside Intel's Core i7 Processor, Nehalem
Posted by
Soulskill
on Fri Aug 22, 2008 06:57 PM
from the upgrades dept.
from the upgrades dept.
MojoKid writes "Intel's next-generation CPU microarchitecture, which was recently given the official processor family name of
'Core i7,' was one of the big topics of discussion at IDF. Intel claims that Nehalem represents its biggest platform architecture change to date. This might be true, but it is not a from-the-ground-up, completely new architecture either. Intel representatives disclosed that Nehalem 'shares a significant portion of the P6 gene pool,' does not include many new instructions, and has approximately the same length pipeline as Penryn. Nehalem is built upon Penryn, but with significant architectural changes (full webcast) to improve performance and power efficiency. Nehalem also
brings Hyper-Threading back to Intel processors, and while Hyper-Threading has been criticized in the past as being energy inefficient, Intel claims their current iteration of Hyper-Threading on Nehalem is much better in that regard."
Update: 8/23 00:35 by SS: Reader Spatial points out Anandtech's analysis of Nehalem.
Related Stories
[+]
Overclocked Memory Breaks Core i7 CPUs 267 comments
arcticstoat writes "Overclockers looking to bolster their new Nehalem CPUs with overclocked memory may be disappointed. Intel is telling motherboard manufacturers not to encourage people to push the voltage of their DIMMs beyond 1.65V, as anything higher could damage the CPU. This will come as a blow to owners of enthusiast memory, such as Corsair's 2.133MHz DDR3 Dominator RAM, which needs 2V to run at its full speed with 9-9-9-24 timings."
[+]
45nm Opteron Performance, Power Efficiency Tested 129 comments
An anonymous reader writes "Now that Intel has unleashed its next-generation Core i7 processors, all eyes are turned to AMD and its incoming wave of 45nm CPUs. To get a feel for AMD's future competitiveness, The Tech Report has taken a pair of 2.7GHz 45nm Opterons (with 75W power envelopes) and put them through the paces against Intel Xeons and older, 65nm Opterons in an extensive suite of performance and power efficiency tests — from Cinema 4D and SPECjbb to computational fluid dynamics and a custom XML handling benchmark. The verdict: AMD's new 45nm quad-core design is a notable improvement over the 65nm iteration, and it proves to be a remarkably power-efficient competitor to Intel's Xeons. However, 45nm AMD chips likely don't have what it takes to best Intel's Core i7 and future Nehalem-based Xeons."
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
yeah, yeah, yeah.. they said this the last time.. (Score:4, Insightful)
The problem with hyperthreading is that it fails to deal with the fundamental problem of memory bandwidth and latency in the x86 architecture. It's true, some apps will see a 20% or better improvement in performance, but most won't see anything more than a marginal increase.
Still, if one can safely enable hyperthreading without slowing down your system, unlike the last time we went through this, we should consider it a success. Hopefully, Quickpath will provide the needed memory improvements.
Re:yeah, yeah, yeah.. they said this the last time (Score:4, Insightful)
For one they use ddr3 memory. Another thing is that they have much more intelligent pre-fetching mixed with the loop detection thingy. The cache size/design itself allows for many applications to run.
The problem that you describe is a problem with the OS's scheduler. It should understand the architecture that it is running on. It should know about the types of caches the way each processor shares them. etc. Thus, it only makes sense to use hyper-threading if 1. you are simply out of cores (the choice of using ht cores is iffy) 2. a single application has spawned multiple threads. Even then you have to take into account the availability of other cores that share the l2 or l3 cache.
I personally think that intelligent pre-fetching and loop detection thingy is something that needs more tests/statistics thrown at.
Like you say, there are some applications that take advantage of HT let them take advantage of it while writing smarter OSs that understand the problems with doing so.
Maybe they need a feed back mechanism from the processor for the OS to understand what is the best way to schedule tasks.
I dont know much about CPUS
Parent
Re:yeah, yeah, yeah.. they said this the last time (Score:5, Informative)
The problem with hyperthreading is that it fails to deal with the fundamental problem of memory bandwidth and latency
The entire point of SMT (of which HT is am implementation) is that it helps hide memory latency. If one thread stalls waiting for memory then the other gets to use the CPU. Without SMT, then a cache miss stalls the entire core. With SMT, it stalls one context but the other can keep executing until it gets a cache miss, which hopefully doesn't happen until the other one has resumed.
Parent
DNF (Score:2, Funny)
Re:DNF (Score:5, Funny)
You probably also want a user interface that does what you mean, not what you said.
Parent
The name is still dumb. (Score:2, Insightful)
That old question (Score:2, Funny)
Re: (Score:3, Informative)
only the super high desk tops have Quick Path and (Score:5, Interesting)
only the super high desk tops have Quick Path and Triple channel DDR3 and the bigger joke is the that there will be 2 differnt 1 cpu desktop Socket.
also the mobile will not have Quick Path.
all AMD cpus use hyper transport and all desktops will use the same socket and the upcoming AM3 cpus will work in the older am2+ boards. Also on amd you can use more then 1 chipset will intel it looks like you will be locked in to a intel chipset.
Re:only the super high desk tops have Quick Path a (Score:5, Funny)
Parent
Re:only the super high desk tops have Quick Path a (Score:5, Interesting)
Parent
Re: (Score:3, Insightful)
Re:only the super high desk tops have Quick Path a (Score:5, Interesting)
Parent
Re: (Score:3, Informative)
Re: (Score:3, Interesting)
Problem being - if most people don't natively benefit from HT then aside from benchmarks or off-the-wall memory intensive apps, HT wouldn't be that impressive.
I've had a core2duo 6600 for over a year now - and from what I've been reading, Nehalem isn't really any large performance boost for the typical user over Penryn. Usually I'll buy new CPU/systems when the performance of mainstream games suffer due to the CPU being outdated; in fact, this e6600 is the first system I've had that I've actually upgraded
Power effiiency is the new "it" (Score:5, Interesting)
Nehalem is really the realization of what many slashdotters have claimed before - the typical user doesn't need that much more performance. Both datacenters and laptop users ask for the same thing - power efficiency - and Intel delivers. The Atom is another part of the strategy, even though it's current coupled with a very inefficient chipset.
The thing is, today we have the knowledge and complexity to fire up kilowatt systems and more - but they're costly running. Certainly there's the extreme hardcore gamers who won't mind running the hottest, most powerhungry quad crossfire system, but they're few and far between. Laptop users think battery life. Desktop users think electricity costs. The result is Nehalem, which promises to deliver a lot more performance per watt.
If the practise is as good as the theory, AMD is unfortunately in deep shit. They've always been good at delivering ok processors at an ok price, but power efficiency has really only been their strength compared to the Netburst (PIV) processors, not P3 or the Cores. If it amounts to "yeah your processors are cheaper but they cost more to operate" things will fall apart, which is sad since ATI is really doing fine. The 48xx series are kick-ass cards, I just hope they can keep up the competition against Intel...
Re:Power effiiency is the new "it" (Score:5, Insightful)
Parent
Re: (Score:3, Insightful)
Intel has money to burn, so they can afford prime-time TV commercials... The question is -- is the return on investment worth it? Your average Joe will buy whatever Dell/HP offers them in the right price range. The ones who are looking for a specific CPU are generally informed enough not to be swayed by TV commercials.
Re: (Score:2)
"The thing is, today we have the knowledge and complexity to fire up kilowatt systems and more - but they're costly running. Certainly there's the extreme hardcore gamers who won't mind running the hottest, most powerhungry quad crossfire system, but they're few and far between."
I think this is a misinformed statement personally, not intentionally as a slant against you but, gamers are one of the few driving the technology in many key area's of research : World simulation, A.I., etc, "Games" are misnomers f
Re:Power effiiency is the new "it" (Score:4, Interesting)
Putting the cringe-worthy PR tone aside (are you connected to intel in any way?), the lowest-clocked 'mainstream desktop' Bloomfield CPU (running at 2.66 GHz, 45nm, quad-core) has a TDP of 130W! Now, efficient or not, that is one hot-and-sweaty processor, making me wonder that if Nehalem truly does have '1.1x~1.25x / 1.2x~2x the single / multi-threaded performance of the latest Penryn ('Yorkfield', 2.66GHz, 45 nm, quad-core, 95W TDP) at the same power level', why wouldn't they let the efficiency gains carry the performance increase of Nehalem for the same TDP?
Look I may or may not be missing something, but I have been reading plenty of (uncomfortably positive, perhaps bankrolled) material on nehalem, yet I can't shake the perception that, with a huge TDP increase, the return of hyperthreading and the cannibalization of L2 cache for L3 cache, Nehalem seems far more Pentium 4 than Penryn.
Parent
Not on the desktop it isn't (Score:5, Informative)
> Desktop users think electricity costs.
Bullshit. The difference between a 130W Nehalem and a 65W Core2 is 65W, which is 11 cents per day (at 7c/kW) or $39/year if you run the computer 24/7. Most people turn the computer off when it's not in use, and 8 hours per day is more likely, or 3 cents per day and maybe $10/year. I'd say the cost is entirely negligible, especially when you compare it to your $80/month Comcast bill.
Parent
Re:Power effiiency is the new "it" (Score:4, Funny)
Here we go, jumping the gun before we hear what Jerry has to say...
Parent
780G is also very power efficient (Score:5, Insightful)
See here [tomshardware.com]
I know it's a tomshardware article but compared to what people have been posting in silent pc review forums the results are consistent. I do think with a better chipset and laptop style power supply the atom platform can go down to sub 20watts, but for now Intel is not making those boards or even allowing atom platforms to have fancy features like PCI-Express. In fact with the older AMD 690G chipset, some people at silent pc review were able to build sub 30watt systems.
Parent
Re:Power effiiency is the new "it" (Score:5, Interesting)
You are behind the times. ATI cards, as far as price vs performance, are spanking NVidia's cards with moon rocks. I think a big helping hand in that is that for whatever reason, AMD said to them, "make better drivers, or else!".
Also, AMD has gone the route of trying to be more open source friendly with their cards, more so than NVidia.
Currently, you just can't go wrong with owning a current generation Radeon card right now.
Parent
Slashdotted (Score:5, Informative)
Here we go again (Score:2, Interesting)
Hyperthreading. I thought I was getting an ultra-tech processor when I bought my Dell 8400 some years back, with its 3.2 GHz P4 hyperthreaded power-sucking processor. Once all the reviews and independent technical evaluations and benchmarks were in, it was revealed that outside of a few niche application areas, hyperthreading wasn't all that great.
It's a good sign Nehalem is also focusing on lowering power usage, the reason Intel had to finally abandon their Tejas plans (the old 8400 Coppermine P4 was a j
Re:Here we go again (Score:5, Interesting)
Parent
Re:Here we go again (Score:4, Interesting)
Hyperthreading can make a lot of sense in some circumstances. Sun pushed hyperthreading to its limits to achieve very impressive energy efficiency for certain niche workloads with its Niagra CPUs and derivatives. (IIRC, up to 128 threads per chip.)
Parent
Re: (Score:3, Informative)
8 threads per core in Niagara 2; you get up to 64 threads, as the chip is available with 4, 6 or 8 cores.
Re: (Score:3, Informative)
Unfortunately those are very, very, very, very, very niche workloads. Your workloads have to be insanely parallel and each thread very independent of others so that you have little that is blocking. In short, Niagra is just marketing.
Re:Here we go again (Score:5, Insightful)
It's really quite amazing how much the hardware has outstripped the ability of software to keep up.
It's not amazing at all. Most desktop applications are single-threaded because you, the operator, are single-threaded. MS Word could enter words on all 100 pages of your book simultaneously, but you aren't able to produce them. An audio player could decode and play 100 songs to you at the same time, but you want to listen to one song at a time...
I can see niche desktop applications where multiple threads are of use. For example, GIMP (or Paint.net or Photoshop) could apply your filter to 100 independent squares of the photo if you have 100 cores. However the gain would be tiny, the extra coding labor would be considerable, and you still need to stitch these squares... all to gain a second or two of a rare filter operation?
The most effective use of multiple cores today is either in servers, or in finite element modeling applications.
Parent
Re:Here we go again (Score:5, Insightful)
It's not amazing at all. Most desktop applications are single-threaded because you, the operator, are single-threaded....
That's a pretty simplistic view. Other than the obvious historical reasons, I believe that most applications are single threaded because the languages and tools for writing non-trivial robust multi-threaded applications is lagging far behind the capability to run them.
Parent
Re:Here we go again (Score:4, Insightful)
And multiple cores? Just the O.S. alone runs many things at once, then you've got your drivers, the applications, the widgets, the viruses(hey they're processes too, just because some people have a bit of prejudice:)), the bittorrent running in the background, and the list goes on.
Mycroft
Parent
Re: (Score:2)
You've trotted out the same old arguments.
Games are in fact one of the ONLY things on consumer PCs that make heavy use of the hardware. Some people edit video also, or play HD video on their desktop. A small fraction do other 3D tasks. Of course these particular apps can use lots of CPU, but they always have.
The rest of it is trivial. In case you hadn't noticed, most modern OSes sit there using less than 1% of CPU most of the time. Sure, there are occasionaly bursts of activity but these are rare and usual
Re:Here we go again (Score:4, Insightful)
It's not amazing at all. Most desktop applications are single-threaded because you, the operator, are single-threaded. MS Word could enter words on all 100 pages of your book simultaneously, but you aren't able to produce them.
Absolute nonsense. Most applications have inherently parallel workloads that are implemented in sequential code because context switching on x86 is painfully expensive.
Consider your example of a word processor. It takes a stream of characters and commands. It runs a spelling, and possibly grammar, checker in the background. It runs a layout and pagination algorithm. Both of these can also be subdivided into parallel tasks. If you insert an image, it has to decode the image in the background. Then we get to the UI, updating the view of the document via scrolling and so on while the model is not modified.
Parent
Re: (Score:3, Informative)
Most applications have inherently parallel workloads that are implemented in sequential code because context switching on x86 is painfully expensive.
Context switching on x86 is dead cheap. It's probably the cheapest of all general purpose architectures available right now. We're talking a few hundred cycles cheap. Only the P4 is a bit behind, and Nehalem makes things faster, to the point where Intel almost catches up with AMD.
Windows manages to make process switches a lot more expensive than necessary, but thread switching isn't bad. With Linux it hardly matters whether you switch processes or threads, they're both fast.
Re: (Score:3, Interesting)
Take MS word.. You have grammer checking, but what about background googling to do FACT checking.
Exactly. There's a million things that a "simple" program like Word could do; instead, they just add on cosmetic crap that slows the program down. I haven't seen a significant advancement -- something that made the old program obsolete -- in Word in a decade.
As one example of a pathetic feature, Word has an option to "compare two documents". In theory, this would be a useful feature when someone extensively edits a document and hands it back to you. In reality, it's completely useless. If you take a d
Re:Here we go again (Score:5, Informative)
Parent
Re: (Score:2, Interesting)
I, for one, welcome our new automatic overclocking overlords.
how much is enough? (Score:5, Informative)
At this point, as long as I can watch HD video without any noticeable slowdowns, I'm good. A GPU or integrated video solution that can do that plus some energy efficient CPU is really all I'm interested now. The software issues with the 4500HD are disappointing, but hopefully it's *just* a software issue this time, and can be fixed soon enough.
Then again, that's just me; I'm not a gamer or video editor.
Re: (Score:2)
so pretty much your saying since you do stuff that can be done with relatively old hardware, there should be no more upgrading for more abilities?
Re:how much is enough? (Score:4, Insightful)
He's saying that there's no killer application for the general user to upgrade to the latest and greatest. Gamers, sure, but they're a SMALL minority of computer users. Multi-threading and more cores than we have now doesn't really do anything for the average person. Until it does, these updates will be received with lukewarm approval. It won't be like the original Pentium again.
Parent
Gene pool comment (Score:3, Interesting)
"completely new architecture either. Intel representatives disclosed that Nehalem 'shares a significant portion of the P6 gene pool,"
That's like saying equations share a significant portion of numbers gene pool. It's all geometry when you get down to it. I mean really, there are going to be certain circuit geometries that are always good to use and whom you can't totally get away from.
Re:Gene pool comment (Score:5, Insightful)
I'm not sure what you mean by geometries. SRAM arrays, flops, random logic, carry-lookahead adders, Wallace-tree multipliers (building blocks of processors) generally look similar across all high-performance ASICs over the past 15 years. Circuit geometries themselves have almost certainly changed completely since P6 days - 45nm is a hell of a lot smaller than 350nm, and the rules governing how close things can be have almost certainly changed.
I think what the article really means is that Nehalem shares a lot of the architectural concepts and style of the P6: similar number of pipe stages, similar number of execution units, similar decode/dispatch/execute/retire width (I think Core 2/Penryn/Nehalem are 4 and P6 was 3), similar microcode, etc. Of course enhancements and improvements have been made in things like the branch predictor, load-store unit, and obviously the interconnect/bus...but if you look at Nehalem closely enough, and indeed if you look at Pentium M, Core 2, Penryn too, you can see the architecture of the P6 as an ancestor.
Parent
Will OS X's Snow Leopard use HT more? (Score:5, Insightful)
Given how closely Apple has worked with Intel before and after the processor switch from PowerPC, I wonder how much more Hyper-Threading aware OS X 10.6 (AKA Snow Leopard) will be? After all, it's supposed to be a "tuning" release focused on full 64 bit performance across the OS, so it wouldn't surprise me to see OS X 10.6 to see much greater speed gains from HT than Vista on Nehalem, especially given Anandtech's description of how Vista screws up Turbo mode [anandtech.com] on Penryn-based systems. (And of course, MS won't go back and put hyperthreading awareness in XP at all...)
Re: (Score:3, Informative)
Intel Will Regret This (Score:3, Interesting)
More than any other organization, Intel knows that multithreading is bad. Lots of smart people such as professor Edward Lee [berkeley.edu] (the head of U.C. Berkeley's Parallel Computing Lab) have warned Intel of the disaster down the road. It is time for Intel and everybody else to make a clean break with the old stuff. There is an infinitely better way to design and program parallel computers that does not involve the use of threads at all. Instead of the Penryn, Intel should have picked something similar to the Itanium, which has a superscalar architecture [wikipedia.org]. A sequential (scalar) core has no business doing anything in a parallel multicore processor. Intel will regret this. Sooner or later, a competitor will read the writings on the wall and do things right. Intel and the others will be left holding an empty bag. To find out the right way to design a multicore processor, read Transforming the TILE64 into a Kick-Ass Parallel Machine [blogspot.com].
Re: (Score:3, Insightful)
It's a great idea and all, but you and what market segment are going to buy hundreds of thousands of those chips to offset to R&D and production costs? The existing x86 architecture is universally supported. Many other better architectures have died on the side of the road because they couldn't get a market segment large enough to support their costs.
-Rick
QuickPath? HyperTransport? (Score:3, Interesting)
The QuickPath sounds so like AMD's HyperTransport. 3 pairs per CPU, integrated controller is exactly what AMD's doing for long long time.
20-bit wide 25.6 GB/s per link? HyperTransport is already capable at deliverying 41.6 GB/s per link in 2006. (according to Wikipedia)
Re: (Score:2)