Intel Launches Core I7-4960X Flagship CPU 180
MojoKid writes "Low-power parts for hand-held devices may be all the rage right now, but today Intel is taking the wraps off a new high-end desktop processor with the official unveiling of its Ivy Bridge-E microarchitecture. The Core i7-4960X Extreme Edition processor is the flagship product in Intel's initial line-up of Ivy Bridge-E based CPUs. The chip is manufactured using Intel's 22nm process node and features roughly 1.86 billion transistors, with a die size of approximately 257mm square. That's about 410 million fewer transistors and a 41 percent smaller die than Intel's previous gen Sandy Bridge-E CPU. The Ivy Bridge-E microarchitecture features up to 6 active execution cores that can each process two threads simultaneously, for support of a total of 12 threads, and they're designed for Intel's LGA 2011 socket. Intel's Core i7-4960X Extreme Edition processor has a base clock frequency of 3.6GHz with a maximum Turbo frequency of 4GHz. It is easily the fastest desktop processor Intel has released to date when tasked with highly-threaded workloads or when its massive amount of cache comes into play in applications like 3D rendering, ray tracing, and gaming. However, assuming similar clock speeds, Intel's newer Haswell microarchitecture employed in the recently released Core i7-4770K (and other 4th Gen Core processors) offers somewhat better single-core performance."
Die size? (Score:5, Informative)
I suspect that should be 257 square mm. A 257 mm square die couldn't even be covered by a standard sheet of paper (US:letter, EU:A4)
Re: (Score:3)
Re: (Score:3)
But I wanted one the size of a paper. It makes it easier to reverse engineer.
Re: (Score:2)
Re: (Score:2)
It also wouldn't fit on a 300mm (diameter) wafer...
Well... perhaps if you cut the ingot lengthwise instead of normal to the axis?
Re: (Score:2)
My impression(as a layman) is that getting fairly substantial amounts of silicon isn't a big deal, with difficulty increasing as your demands concerning purity, mono-crystallinness, and dimensional accuracy go up; but that the cost of the entire chip fabrication process get very big, very fast, if you want to work with larger wafers.
Re: (Score:3)
Skip the die size. What's the SPECint and SPECfp? Do processor makers submit these numbers anymore?
Any other metrics are secondary.
Re: (Score:3)
Re: (Score:2)
SPECint and SPECfp are a bit useless, they only test a single core and with modern CPUs you cannot just multiply that number by the number of cores and get a meaningful result.
SPEC has attempted to fix that simply by running multiple copies of the benchmark and aggregating the result as "SPECrate". Whether that measures anything which is useful for actual workloads is debatable. It certainly does not reflect a modern multithreaded workload.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Writing "square mm" is perfectly correct [nist.gov].
It's not perfectly correct. It's acceptable.
Example: meter per second squared (m/s2) The modifiers “square” or “cubic” may, however, be placed before the unit name in the case of area or volume.
Power consumption (Score:3)
Low-power parts for hand-held devices may be all the rage right now, but today Intel is taking the wraps off a new high-end desktop processor
Actually, I think that useful computation per joule is all the rage all over the device size scale. See? This one works everywhere.
Re: (Score:3)
Re:Power consumption (Score:4, Informative)
Boring on the Desktop Great in Servers (Score:4, Informative)
These chips are slightly faster (given equal core counts) than their predecessors but not in any interesting way.
However, you have to remember that these are really server chips that are repurposed for high-end desktop use. The one vital metric where these chips shine is in their power consumption (or lack thereof): Techreport did a test where the 6-core 4960X running full-bore is using about the same amount of power as a desktop A10-6800K part ( http://techreport.com/review/25293/intel-core-i7-4960x-processor-reviewed/9 [techreport.com] )
That level of power efficiency will do wonders in the server world and these chips (and their 12-core bigger brothers) should do quite well in servers.
Re: (Score:2)
Considering that AMD is a gen or two behind, and their chips arent currently known for their efficiency, I dont know how impressive that is.
Re: (Score:2)
Re: (Score:3)
That level of power efficiency will do wonders in the server world and these chips (and their 12-core bigger brothers) should do quite well in servers.
And later this year, when Atom goes to 22nm, it may also do quite well in mobile phones, given they've already developed a quality ARM emulator.
Sounds great for CAD (Score:2)
Re: (Score:2)
Re:Boring on the Desktop Great in Servers (Score:4, Informative)
Intel segments the desktop and server market with ECC functionality. Xeons support ECC, everything else does *not*. So unless this new chip supports ECC, you're off your rocker thinking this is a repurposed server chip.
The same die is used for both chips; it's just that the ECC functionality is fused off in the non-Xeon parts binned for desktop use.
By the way, it's not strictly true that Xeons are the only Intel parts that support ECC. Ivy Bridge Celerons and Pentiums have this feature as well (if you use a compatible server motherboard). It was fused off on the mainstream desktop quad core parts because they wanted people to buy Xeon E3's instead.
Re: (Score:2)
It pretty much is :)
Anywhere there isn't a Xeon equal, you stand a chance of finding ECC support. As the parent post noted, some celeron / pentiums have that - and so do a lot of Core i3 (dual core w/ hyperthreading) since there are no comparable Xeon E3 (all quad-cores).
Re: (Score:3)
AIUI Intel takes a handful of basic designs and cripples them in different ways to produce a wide variety of products which they then sell at different price points depending on what they think customers will be willing to pay.
Re: (Score:3)
Re: (Score:2)
Interestingly according to the die photo this time round it appears to have been designed as a 6 core rather than designed as an 8 core and then crippled to make a 6-core like it was with SB-E.
Re: (Score:2)
Intel segments shit like AES NI, Vt-d and "TSX" as well (Haswell?). Not available on your i3.
Re: (Score:2)
Yes you are right... it is unrealistically favorable to AMD that is since if you had bothered to look at the charts you'd note that the benchmark was a CPU-only test that gave AMD the advantage of being able to run the GPU at very low power since it isn't being stressed and redirect the power consumption to the CPU...
Oh and they also tested with discrete GPUs that completely relieve the APU of having to expend any energy on the IGP at all.
Another marginal perf iteration of Core (Score:5, Insightful)
Re:Another marginal perf iteration of Core (Score:4, Insightful)
There are two reasons:
1) AMD is really behind after they reworked their architecture, hence no pressure on Intel.
2) Moore's Law has ended some time ago on a per-core basis and nobody noticed.
Re: (Score:3)
Re: (Score:2)
Speed vs Price is important when comparing similar speeds. Price doesn't matter if the speed isn't good enough, which is where Intel is winning.
Re: (Score:2)
Re: (Score:2)
That's a fair question. I can think of many things that I do and new features in programs that I love that would probably easily run on a very old computer. I used a 2003-era laptop until 2011 that met the vast majority of my needs. That's why I choose an i3 for my new desktop. It had excellent bang-for-the-buck and was so much faster.
Re: (Score:2)
And yet nobody is buying AMD products in the desktop or server space. AMD has consistently been below 10% for over a decade I believe.
Price/performance doesn't matter a whole lot when the difference in price on the chips is less than $100. If you're buying i3/i5/i7-class chips you're already looking at real world performance rather than budget.
Re: (Score:2)
There are two reasons:
1) AMD is really behind after they reworked their architecture, hence no pressure on Intel.
That's a really stupid thing to say, as if thousands of highly skilled engineers at Intel turn up every morning and just don't give a shit. If you've been paying attention, if there's any lacking on the desktop/server chips it's probably due to Intel going all out to take ARM's business in the mobile and tablet space.
Re: (Score:2)
Re: (Score:3)
"screw power" and either run the chip faster and invest a thousand dollars for a cooling system "
Water cooling systems are a LOT cheaper than that. Look at what overclockers are using today you can get a good watercooling system to suck out a LOT of that heat for less than a couple hundred bucks.
Problem is that most guys undersize the heat dump radiator.
Re: (Score:3)
Re: (Score:2)
Tell me again how gamers aren't interested in how much power a stock CPU uses.
Re:Another marginal perf iteration of Core (Score:4, Insightful)
Comment removed (Score:4, Informative)
Re: (Score:2)
Anyway, in response to the original post, lower power means cheaper power components that can't handle as many watts so it actually limits the amount of power the CPU can use. They don't make a chip twice as efficient and then leave the s
Re: (Score:2)
Anyway, in response to the original post, lower power means cheaper power components that can't handle as many watts so it actually limits the amount of power the CPU can use.
Do you have any evidence of this? That sounds like pure conjecture to me.
Re: (Score:2)
Worse, IB-E has thermal compound between die and heatspreader, same as IB vs. SB.
Source? I thought a while back someone pulled the lid off a Ivy Bridge-E CPU engineering sample and found that it was soldered (the CPU was destroyed in the process). There were photos of this posted on a couple of sites.
Re: (Score:2)
Re: (Score:2)
AVX2 v.s. AVX : 8 x 32 integers per instruction v.s. 4 x 32 bit integers per instruction. (Ivy Bridge v.s. Haswell)
The performance gains certainly are there. As per usual, meaningless benchmarks are meaningless.
Re: (Score:2)
Its not that clear cut. The throughput of the instructions count as well. In many cases AVX isn't faster than SSE because the core can retire 2x the SSE instructions per cycle. Furthermore, it can be harder to get a x8 vector than a x4 one.
Think how useful 4x4 matrix operations are for 3d graphics. Then consider how to write optimal code using a x8 vector.
Now all that said, AVX(/2) can really win in some cases
Re: (Score:2)
Except GPU encoding sucks : it is fast but the quality is compromised. Some step is very GPU-unfriendly (stuff with a lot of branching and random access is) so it'd done in a simpler, quicker way or at a lower "profile". GPGPU encoding even becomes pointless if you have encoding hardware (Quicksync, NVENC) which gives you the same result but using a lot less energy, while CPU is where it's at if you want max quality for a given size.
GPU still have to take over a lot of tasks (why not some kinds of audio pro
Re:Another marginal perf iteration of Core (Score:4, Informative)
The only thing that's laughable, is that the desktop gamer thinks everything is about him and that his concerns add up to even 1% of the market.
Re: (Score:2)
If you aren't gaming, why buy a desktop? I suppose there is still AutoCAD and compiling, but that market seems even smaller than the gaming market.
Re: (Score:3)
You don't know desktop gamers very well.
The better per-thread performance of the competing Haswell part may keep them away, though(unless the increased cache makes up for it). Games make better use of additional cores than they used to; but they still don't tend to go as far in that direction as server or some workstation loads.
Some people are going to buy it just because it's the flagship, of course; but better performance on highly threaded tasks won't necessarily save it among gamers. (Especially if Intel prices it so as to discourage peopl
Re: (Score:2)
Actually, given that streaming is becoming more and more common among gamers, so multiple cores/hyperthreading is becoming quite popular with gamers too.
1080p streaming in 60FPS is quite CPU intensive.
Re: (Score:2)
I have an i7 3660, and with 8 threads, I still have only found a single program that would thread onto more than four of the cores: VLC Media Player. It seems only the super techy, data-intensive, community-built software can keep up with the core wars? Am I just playing the wrong games?
Re:Another marginal perf iteration of Core (Score:4, Funny)
Re: (Score:2)
Re: (Score:2)
There are several programs used to encode your stream for broadcast that supports more than 4 threads, and can use them if the load is high enough. But there's also the fact that the game itself uses some CPU, then you have the streaming software running, which grabs the game output and encodes it.
Meaning that you really want multiple cores/hyperthreading.
OBS is one of those streaming programs(You can also use QuickSync if you have that enabled)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Why? Cause having threads over makes other programs run nicely. Run a program eating 4 threads, and you still have response in GUI and other programs. Unless you hit some other bottleneck (and you will).
Just having threads does not necessarily mean parallel processing. It becomes increasingly more difficult to add functional threads to an application without getting them locked down by mutexes.
My next machine will be a i7 with SSD, no bit storage anymore, internet will keep my movies from now on :-)
How does the release of a new CPU have anything to do with you wasting internet bandwidth?
Re: (Score:2)
The better per-thread performance of the competing Haswell part may keep them away, though(unless the increased cache makes up for it). Games make better use of additional cores than they used to; but they still don't tend to go as far in that direction as server or some workstation loads.
Indeed.
Since sandy bridge Intel has been releasing the high end desktop parts very late compared to the mainstream parts. By the time SB-E came out the mainstream desktop parts were on very nearly on IVY bridge. This time it was even worse, not only did IB-E comw out AFTER the mainstream desktop parts were on haswell but haswell brings a more substantial improvement in IPC than IVY did.
They try to hide it with misleading model numbers but I strongly suspect that most of the people who spend this much money
Re: (Score:2)
Re: (Score:2)
Maybe some gamers shut down everything on their desktop when they are playing. Personally leave all my apps open, sometimes even playing music or videos on another screen. So regardless of a game making full use of a CPU
Re: (Score:2)
The video game norm is to have 2 main threads and one GPU driver thread ("3 core utilization"). There'll be a whole bunch of secondary threads as well - but these consume negligible amounts of time (tops 5% or so totalling all of them), and many are only triggered in specific conditions - such as when the game needs to load new
Re: (Score:2)
Re: Another marginal perf iteration of Core (Score:2)
Is this the first multiplier unlocked Intel chip (K series) that I can buy without a crappy Intel IGPU? So it should be cheaper, right? I already have a high end discrete GPU.
Re: (Score:2)
Is this the first multiplier unlocked Intel chip (K series) that I can buy without a crappy Intel IGPU?
No.
Even counting just K suffix chips. There was the i7-875K back in 2010 though noone showed much interest because it was before Intel clamped down on multiplier overclocking. There was also the i7-3930K 6-core SB-E chip more recently. If you also count extreme edition (X suffix) chips then there were a lot more unlocked intel chips without integrated GPUs.
So it should be cheaper, right?
No
While you are doing away with the GPU yes but you are getting more memory channels, more PCIe lanes and possiblly more cores (depending on which of the
Re: (Score:2)
before Intel clamped down on multiplier overclocking.
Brainfart, I mean before they cracked down on FSB/BCLK overclocking.
Re: (Score:2)
Actually, the i7 3820 is cheaper than 2600K and 3770K, the i7 4820K will be cheaper than 3770K and 4770K I think (even just 20 euros). This is more than offset by the cost of the motherboard though.
Re: Another marginal perf iteration of Core (Score:4, Funny)
i still dont understand their numbering system. it seems designed to generate the most confusion possible as to what is where in the hierarchy.
its got a higher number...
oh wait but its an older name...
no wait, it's got an -E...
but the box is Blue...
with an Eagle...
but its wings are folded...
Blah.
Re: (Score:2)
its got a higher number...
The number system is a mess and really does feel like it was designed by marketing.
Afaict the fundamental issue is that Intel has recently been focussing more on it's laptop and mainstream desktop markets than the high end desktop and server markets. The result is that the high end desktop stuff is currently about a generation behind the mainstream desktop stuff.
Of course intel doesn't want to make that too obvious, so they have made the first digit of the part numbers on their last two generations of high
Re: (Score:2)
there IS a difference in box color? hfs. i was having fun with it at the end. i wasnt even aware they had a box color thing going on.
i havent actually bought a cpu in >5 years, and that was an AMD...last Intel I bought was a Pentium 3 at 500MHz, so that's what...14 years?
The confusion is everywhere inside Intel. (Score:2)
Funny story: I visited the Intel web site and was asked to complete a survey. I gave a few of the reasons why Intel CEO Paul Otellini [intel.com] should be fired, like paying $6 Billion for McAfee when Microsoft is giving away its Microsoft Security Essentials anti-virus software. A few months later Otellini left Intel; they didn't say why.
Re: (Score:2)
While I agree that is a good backup plan, if/when my GTX Titan dies I already have 2x GTX 560 Ti w/ 448 cores in 2 different machines. (The GTX Titan replaced one of them.)
I _don't_ want 2 sets of GPU drivers installed on my system, just one.
Can't wait for the server version (Score:2)
3.6GHz base clock is the fastest we've had since the last generation P4's, and with the obviously superior IPC of the IB this thing's going to be a monster for certain workloads where the code doesn't scale well to multiple cores. The only downside is it's not 8 cores/16 threads at those speeds which is a bummer for virtualization hosts. Oh well, the E5-2670's at 2.6GHz do a pretty good job =)
Re: (Score:2)
2 memory channels makes it not very useful for my purposes but it is in fact slightly faster =)
Re: (Score:2)
bah, why not a dual socket chip? Why can't I get the best single core performance in something that actually makes sense to license (ie VMWare and MS both license based on two sockets per server minimum).
And still ineffective... (Score:3)
Because the only Multi Chip processors are still 4 years behind this. Why dont they just enable the ability for me to drop 4 of these on a single motherboard so I can have my 24 core monster for editing and rendering 4K video?
2CPU.COM (Score:2)
I still have an old Abit BP6 system sitting next to my desk gathering dust if you want it. I even have 4 extra celeron processors for it!
Back when men where men, and dual core meant two processors!
Sadly other than specialized software, most are still only designed for single core anyway, making the performance gains negligible for most people, which means other than an expensive marketing ploy to a small enthusiast market, not much of a market advantage for any company to do so...
Re: (Score:3)
I was about to mention that all of the things you talk about are more memory intensive than anything else, which of course is OS dependent, requiring 64-bit, which in addition to hardly anyone bothering to run multi-threaded software, no one bothers to write software optimized for 64bit systems either.
The main problem being is that relatively speaking single thread 32bit applications are what people are used to making is simple compared to writing a multi-threaded 64bit optimized application. Unless there
Does it support TSX? (Score:2)
I wanted to get a 4770K but Intel disabled TSX (Transactional Synchronization Extensions) on that CPU.
Re: (Score:2)
Anyhow, I'm wondering if the 'X' line supports TSX or not. I can't find docs or specs that answer one way or another right now.
Re: (Score:2)
It's almost an oxymoron if you are talking about a single-socket Intel cpu. You don't actually need the transactional extensions to make things go fast. It's only when you get to multi-socket where the cache management bandwidth (which is what the transactional extensions are able to avoid) becomes a big deal.
If the purpose is to test code performance then it is better to test without transaction support anyway since transaction support is not a replacement for proper algorithmic design. Or to put it ano
Re: (Score:2)
It's almost an oxymoron if you are talking about a single-socket Intel cpu. You don't actually need the transactional extensions to make things go fast
Not true... I've written an entire concurrency system including a lock free library and a multicore memory mananger [gdcvault.com]. There are a number of places where TSX offers a large speed improvement even on a single core.
If the purpose is to test code performance then it is better to test without transaction support anyway since transaction support is not a replacement for proper algorithmic design. Or to put it another way... if you code SPECIFICALLY for one of the two intel transactional models that means you will probably wind up with very sloppy code (such as using global spinlocks more than you need to and assuming that the underlying transaction just won't conflict as much). The code might run fine on an Intel cpu but its performance value will not be portable.
Are you even familiar with how TSX works? Hardware Lock Elision is a very simple replacement for atomic locking. You can write a very simple user level mutex using atomic operations that has a fallback to an OS yielding construct. In fact that's what we do in my concurrency library. Uncontested
Re: (Score:2)
Well, for video games (or anything you sell to the consumer), you clearly do not want to rely on Intel's transactional extensions because doing so could significantly reduce or destroy performance on any customer systems that don't have them.
Basically the way the basic (the prefixed) transactional extension works is to avoid dirtying the cache line(s) associated with the spin lock or unlock operations, with the assumption that the operations which are run within the locked section are less likely to conflic
Re: (Score:2)
Then you add HLE extensions and it runs even faster on CPU's that support TSX and you get a rather large performance bonus for free as at that point a majority of your atomic operations become free.
It also allows you to do substitute simple lock-free and non-blocking algorithms that rely on multiadd
Re: (Score:2)
Not to put too fine a point on it, but I've written hundreds of thousands of lines of SMP code on modern systems (and, frankly, I was doing SMP code with paired 8-bit CPUs over 28 years ago), so if you think you are somehow stating something in regards to my knowledge base, I would humbly suggest that you take your opinions and shove them down a toilet somewhere because you clearly have no clue whatsoever as to what I've been doing the last 20 years.
-Matt
Re: (Score:2)
We are already using fine-grained locking, striped locking, reader/writer locks, lock-free atomic SList, lock free allocators, etc. I am interes
Re: (Score:2)
So as far as game design goes, the transaction stuff is worse than worthless.
I want to feel you're not just trolling me because apparently you've been developing software since at least the Amiga days (we have that in common). However, I feel you are quite misguided on some of your assumptions here.
Not to say I may have a more informed opinion than you because I don't know your personal experience in game development, but I certainly feel that TSX isn't worthless for games and I've been writing performance code full time for games for over 20 years.
Re: (Score:2)
I don't think you actually bothered to read and understand what I wrote. Try again. This time read my responses (or at least the first two) a bit more carefully.
I'm not in the least saying that transactional hardware support is bad. I am saying that programming to Intel's transactional interface FIRST, as your primary programming model, particularly for consumer applications, can lead to very undesirable results on hardware that doesn't support it.
Intel tends to implement first-run features with very wei
Re: (Score:2)
Also, we are not trying to write specifically to HLE. We are trying to write stuff that runs well on multicore systems and then layer HLE on top of it for an added performance benefit for when we do have lock conflicts.
I agree that well written applications don't have nearly as many locking
Re: (Score:3)
It safe to say it does not, as TSX is a Haswell feature and 4960X is an Ivy Bridge CPU.
What you would need is the 4960X's successor, which is Haswell-E on a new socket called LGA 2011-3 with ddr4, and its server counterparts. Or get a vanilla 4770 or 4771.
TPM no more (Score:3)
What's a grand really worth? (Score:2)
So for $1000 I can get 1.5x the peak multithreaded performance over the $300 processor released three months ago. And if you run lightly threaded apps, the processor from earlier in the summer may still be faster. Wow...what a bargain. I'd say sign me up for two but, alas, Intel won't let you run multiple processors without paying the xeon tax.
Lowest performance per price (Score:2)
Re: (Score:2)
Re: (Score:2)
General rule of thumb is that 2x hyperthreads is approximately equal to 1.5 real cores. Nobody is lying, Intel makes the thread/core distinction very clearly. The reason is primarily due to pipeline and memory stalls creating space which can be filled by the other thread.
Keep in mind that a modern superscale cpu can have something like 160? (number not exact) instructions in-flight at any given moment, depending on how good the branch prediction is. Instruction execution is not really a matter of clock c
Re:wrong wrong wrong (Score:5, Informative)
Amazing. Everything you said about HT is completely wrong. Where ever did you get this information?
Intel's hyperthreading consists of two logical processors sharing the same compute resources. Each logical processor has its own register set but shares decoders, adders, shifters, cache, etc. as it goes about executing its assigned thread. The sharing process is vastly more complex and efficient than you seem to think -- there's no alternating of cycles. Once instructions are decoded into uops, they flow through the pipeline in a dynamic fashion that sometimes leads to one thread using most of the resources while the other one waits. In fact, this is a big advantage of the design -- when one thread stalls from a cache miss, the other one uses all the resources until the first thread's memory access completes. A much better plan than your scheme of using only even/odd cycles.
Managing this process is not simple, and steps must be taken to avoid both deadlocks and livelocks as the two threads compete for resources. But the process is dynamic -- the design allows one thread to run unimpeded when it makes sense to do so, while still preventing one thread from being starved at the other's expense. But this "every other cycle" notion of yours is pure nonsense. The core can retire up to four uops per cycle, and at times these all come from the same thread.
Re: (Score:2)
Your P4 at 4GHz can't do nearly as much as a single core on a newer processor. My 2.4GHz P4 coverts DVD movies to low-res in 8 hours or so, my 2.8GHz i3 does the exact same thing in 20 minutes, 24x faster overall and 6x faster per thread.
Re: (Score:2)
That late generation P4 used a 31-stage pipeline to achieve those high clock speeds. The ivy bridge architecture uses a 14-stage pipeline giving it higher IPC than the power hungry NetBurst line could ever hope to achieve.