8-Core Intel Nehalem-EX To Launch This Month 186
MojoKid writes "What could you do with 8 physical cores of CPU processing power? Intel's upcoming 8-core Nehalem-EX is launching later this month, according to Intel Xeon Platform Director Shannon Poulin. The announcement puts to rest rumors that the 8-core part might be delayed, and makes good on a promise Intel made last year when the chip maker said it would release the chip in the first half of 2010. To quickly recap, Nehalem-EX boasts an extensive feature-set, including up to 8 cores per processor, up to 16 threads per processor with Intel Hyper-threading, scalability up to eight sockets via Intel's serial Quick Path Interconnect and more with third-party node controllers, and 24MB of shared cache."
March of the penguins (Score:4, Funny)
Ah! My dream of the day when I can boot up and see penguins taking up the entire screen is almost here.
Re: (Score:2)
Gentoo Penguins [komar.org] - King Penguins [komar.org] - Penguin being attacked by a Skua! [komar.org]
Re: (Score:2)
What are you talking about? I can finally run Vista with that! If I'm lucky, I might even get Aero!
March of the Macs (Score:2)
I already have an 8-core machine, a Macpro, built from two, 4-core MPUs. And I do a lot with it.
Hopefully what this means is Apple will be releasing a 16-core Macpro. Yum! Some saving will be called for, though. [cough]
Re: (Score:2)
Meh? Many of us have already seen 8 and even 16 on a screen, but I'm talking about seeing 32 or 64 on a screen when you have dual or quad chip boards with these 8 core chips.
Probably some people out there have already seen this on non-intel hardware I guess? Like Sparc hardware?
minimum hardware required.... (Score:2, Funny)
Now we know what will be needed to run Win 8, I guess.
I better get started on my backyard fusion power plant....;-)
2006 called.. (Score:2)
..they want their joke back. Windows 7 runs perfectly fine on 6 year old machines. But MS is known for making shitty OSes with alternate versions so Windows 8 may still suck... though initial impressions are that not much will change from Windows 7.
Re: (Score:2)
..they want their joke back. Windows 7 runs perfectly fine on 6 year old machines. But MS is known for making shitty OSes with alternate versions so Windows 8 may still suck... though initial impressions are that not much will change from Windows 7.
every other one sucks? I haven't seen a good windows product at launch since win2k. XP was ok towards the end of it's lifetime. In my mind the jury is still out on 7, It's better then vista, but that doesn't say much. :(
Re: (Score:2)
I am glad to see that I'm not a voice alone in the wilderness here on this issue. I also have to agree that the best operating system that Microsoft has ever released was Windows 2000 Professional. It was a significant enough improvement (where it counted.... in the OS core with stability and improved app security and other real OS features, not these crazy and fancy GUI features) over Windows NT 4 and the other operating systems like the MS-DOS based OSs of Win 95, 98, and ME that it really drew a line i
Re: (Score:2)
2002 called, it wants it's rant back.
Re: (Score:2)
Seven is iffy, from a UI point of view. Some of the UI and management changes are nice. Some were just change for change's sake, and quite backward in usability.
Anyone else feel that way?
Re: (Score:2)
I don't think any of them were "change for the sake of change", but I do think some of them are definite steps backward for particular types of users (eg: heavy multitaskers like me).
Re: (Score:2)
Much like game DRM, if Win 8 does that idiotic "rent-a-feature" licensing, someone will come out with full-featured hack before the thing is even released to manufacturing. Illegal ? Sure. Better than the official pay-per-view way ? Definitely. If we are to learn anything from the past, we should know that in most cases the hacked/pirated solution works better than the real thing, because the people designing and building DRM schemes are, by definition, not the sharpest tools in the shed.
Balance (Score:3, Interesting)
Re:Balance (Score:5, Funny)
Yes
When will I be able to actually buy a mother board with 8 of these 8 core CPUs
When you move out of your parents garage.
Re: (Score:2)
Re:Balance (Score:5, Funny)
Re: (Score:2)
Re: (Score:2)
So what your trying to say is "Never!"
Re:Balance (Score:4, Interesting)
These are target it the Virtualization and specialized application space. You are not going to put these in your gaming rig, and your not going to use the +4 core models in your tranditional stand alone application server. You could get much better dollar to performance ration elsewhere if those are your intended applications.
Now slapping two or more of these things on a Linux box with a ton of UMLs running or on VMware ESX, and loading the system up with 128 gigs of ram and a medium business can probalby run their entire datacenter on 2 boxen + an entry level SAN.
Re: (Score:3, Insightful)
Re: (Score:2)
Re: (Score:2)
it's in the same place that unlimited upload/download 100mbit connections to everyones home are.
Re: (Score:2)
UML? Where have you been for the last five years?
KVM or Xen [wikipedia.org] are where it's at on Linux.
Re: (Score:2, Interesting)
You are not going to put these in your gaming rig
I hear this a lot, but in a modern OS (e.g., one with a good scheduler) and with modern applications (ones that use either threading or cooperating processes), you can easily use a handful of processors, and yes, with normal desktop apps. Google Chrome, for instance, uses the cooperating process model, and for security reasons, I think you're going to start seeing [good] programmers divvy up their applications this way. Not only does it make application security a bit easier (separate address space for ea
Re:Actually... (Score:5, Interesting)
It will improve gaming performance if you happened to be running something like Quakes Wars in ray tracing [wikipedia.org].
Intel put together a demo on a workstation system with two Nehalem quad-core CPUs getting about 15 - 20 fps.
Since ray tracing is embarrassingly parallel [wikipedia.org], all one needs to do to improve performance is to throw more cores at it.
Keep in mind ray tracing is much more cpu intensive than gpu intensive...
Re:Balance (Score:5, Insightful)
From TFS's mention of "up to 8 CPUs or more with third-party node controllers" I'm(perhaps optimistically) assuming that that means all the RAM in an up to 8 socket system wouldn't be more than one hop away from any core.
They almost certainly didn't go with 24MB of cache because their main memory situation is perfect; but intel's bigger chips are substantially improved from the old "Hey, let's hang a bunch of super expensive Xeons off a dubiously adequate northbridge through a shared front-side bus, let them starve for memory access, and then get curb stomped by cheaper Opterons!" days.
Re: (Score:2)
From TFS's mention of "up to 8 CPUs or more with third-party node controllers" I'm(perhaps optimistically) assuming that that means all the RAM in an up to 8 socket system wouldn't be more than one hop away from any core.
The block diagram in TFA shows 4 QPI interfaces, so theoretically yes.
In practice almost certainly not, because that theoretical setup has no interfaces left to hook any I/O devices up to it and so is kinda useless. So at least one core has to dedicate at least one link to hook up a PCIe b
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
I wouldn't be so sure. Like previous incarnations of the Xeon MP series, this one will be much more expensive per FLOP than the 2 sockets-per-node machines that make up most x86 entries in the top500 list.
Anyway, for these big machines parallel scalability is mostly determined by the internode network, merely stuffing more cores per node does nothing. Or actually, if you don't increase network performance as you make the nodes fatter, parallel scalability will worsen as you have more cores sharing the netwo
Re: (Score:2)
I suppose it is possible to virtualize a CUDA processor in reverse and have, in effect, a low level Nvidia card. Output would be problematic...
Finally! (Score:5, Funny)
Re:Finally! (Score:4, Insightful)
Re: (Score:2)
It’s a bit bad no say that it always is 15 years. It’s exponential. The goal is growing exponentially. That 15 years it took to raise processing power to where you could run it on a phone, is maybe a couple of weeks nowadays. That gives you a better feeling for it. ^^
Of course the 1TB in 15 years still fits.
Re: (Score:2)
> In 15 years you might have a 1TB database running on your personal communicator that fits in your pocket. (in keeping with the "15 years out" prediction theme of the day.
Hmm. Applying one of the Moore's Law variants to NAND flash, if storage size for the same price doubles every 18 months, 15 years is 10 generations. 2^10 = 1024. 4-8 GB of flash memory is already relative cheap today, even in the form of a microSD card the size of a fingernail, so I'd be kind of disappointed if we didn't have 1 TB flas
Re: (Score:2)
Re: (Score:2)
Yes, but unfortunately it will only run Crysis on Windows XP. For Vista, you have to wait for the 16-core, I'm afraid. :-/
IBM Power7 also has 8 cores (Score:5, Interesting)
http://arstechnica.com/hardware/news/2009/09/ibms-8-core-power7-twice-the-muscle-half-the-transistors.ars [arstechnica.com]
Sun Ultrasparc T2 has 8 cores... and 64 threads (Score:5, Informative)
http://www.sun.com/processors/UltraSPARC-T2/ [sun.com]
And the future Ultrasparc T3 will have 16 cores and 8 threads per core for a total of 128 threads per chip
http://arstechnica.com/business/news/2010/02/two-billion-transistor-beasts-power7-and-niagara-3.ars?utm_source=rss&utm_medium=rss&utm_campaign=rss [arstechnica.com]
Re: (Score:2, Insightful)
Re: (Score:2)
What the T-series really needs is a boost to per-thread performance since it will otherwise remain a specialty processor only suitable for certain workloads.
The T2 core has more than enough parallelism for most apps out there. What isn't appreciated though is that it pushes the server *implementation details* all the way up to the app-developers, which causes them grief when they need to target different hardware or when they utilize "junior" developers. It also causes a lot longer performance tuning phases
Re: (Score:2)
Sun's UltraSPARC T1 had 8 cores and a total of 32 concurrent threads, since 4 years ago. Best of all, that CPU is very low-power. Even better, it's completely open-source. You can download everything:
ISA specification
Verilog RTL source code of the design
Verification environment, diagnostics tests and simulation images
Re: (Score:2)
But the specification does not mean anything, since the actual layout on the chip is much harder to get right, than you might imagine.
So since it doesn’t help to get the spec for free: Where can I buy a version that will run my games? (The only resource-intensive processes I run.)
Re: (Score:2)
If EX delivers the expected performance, it will have the same performance per socket than Power7, but half the threads. I prefer the better single thread performance of EX than Power7.
Re: (Score:2)
Your post is all sorts of confusing.
That makes it sound like more nanometers is better. Not in this case.
You have an extremely interesting definition of low end.
Re: (Score:2)
If it is matter of core-war, IBM's latest Power7 also has 8 cores. It is actually based on 45nm technology compared to Intel's latest 32nm. What makes Power7 exciting is that it has on-die 32MB L3 cache. They achieved this by introducing eDRAM (embedded DRAM) in the technology. Both Nehalem-EX and Power7 are targeting low-end server market, so it should be interesting battle.
http://arstechnica.com/hardware/news/2009/09/ibms-8-core-power7-twice-the-muscle-half-the-transistors.ars [arstechnica.com]
Since when is the "low-end server market" made up of 8-core 8-socket machines? Are you from the... future?
Licensed per Core (Score:2, Interesting)
Re: (Score:2)
That's a cool idea... until the motivation for efficient computing dwindles because the more "user time" the software burns, the more the vendor will charge.
In the case of VMWare, perhaps they could charge you based on the number of startup instances per time period. (But then every time you reboot a VM for maintenance, you get charged, so the incentive for security is diminished. Though one could try ksplice [ksplice.com].)
When will Moore's Law apply to Cores? (Score:3, Insightful)
So can we now expect a doubling of cores every 18 months?
Re: (Score:3, Insightful)
So can we now expect a doubling of cores every 18 months?
Moore's Law refers to transistor density, right? As long as programming makes the expected shift to massively parallel techniques that would justify a very large number of cores I think the answer to your question is yes.
Re: (Score:2)
Past a point, cores may simpify, I guess thats what we already have in billion transistor GPUs with 1000+ stream processors etc. I would go so far as to say the
Re: (Score:2)
Re: (Score:2)
There are some tasks that are very much serial, and these will not benefit from the additional cores, but there are MANY things that can benefit from the additional cores. It is NOT easier or cheaper to throw more hardware at a problem when processors are not getting all that much faster at this point, but are moving to more cores. Really, it comes down to how much understanding of multi-threading the software designers have when they design an application. You will always have some people out ther
Re: (Score:3, Funny)
Re:When will Moore's Law apply to Cores? (Score:4, Insightful)
You just made me realize nobody named Cole (Ashley Cole, Cheryl Cole, Nat King Cole) will ever have a law named after them. Everyone will just snicker and it'll never catch on.
Re: (Score:2)
Re: (Score:2)
A core is theoretically a fixed number of transistors, and Moore's law (depending on which version you quote) essentially states that the number of transistors per chip will double every 18 months. Therefore, a corollary to Moore's Law is that the number of cores could double every 18 months. I say could because engineers may decide to put more transistors in each core, which would slow down the core increase.
Also, there is another caveat: many applications will never be able to take advantage of an insane
Hyperthreading (Score:2, Insightful)
Why are they are still announcing hyperthreading? It was established long-ago that it had no benefit. It's been off on any machines I've ever purchased.
No benefit? (Score:3, Informative)
It should also be noted that the applications that benefit are ones that would generally be used in Xeon (server and workstation) machines. Further, most of the applications that fail
Re: (Score:2)
Okay, perhaps "no benefit" was too strong of a term. There are cases where it helps, and various replies have pointed-out those cases. As a general rule though, if a feature helps 10% of the time, and hurts 10% of the time, I would rather it be off.
Ideally, the feature would be enabled for those apps that do benefit from it. Perhaps the implementation was the problem - the OS general reports twice as many CPUs as it really has, which causes some apps/servers to spawn more threads than they should.
Re:Hyperthreading (Score:5, Informative)
Hyperthreading used to suck, but it works pretty well now. In the benchmarks I've done with my code I see about a 60% speedup.
http://www.vips.ecs.soton.ac.uk/index.php?title=Benchmarks#Results_summary [soton.ac.uk]
Re: (Score:2, Interesting)
According to my server metric graphs the additional threads are only useful for WIO CPU states.
For example, on Intel 4core i7 920 processor, enabling hyperthreading impersonates additional four cores. But CPU utilization reported by metrics software shows that USR and SYS cpu times will only go up to 50% and WIO will add another 12%. This corresponds to having a virtual core used for waiting to IO stuff. Additional 3 virtual cores do not serve anything at all.
Re: (Score:2)
It can make a big difference to some applications, like 3D renderers. Sometimes it doesn't help, but disabling it without considering the typical load is unwise.
Well a few reasons (Score:2)
First off it went away for a long time. The P4s had hyper threading but the Pentium Ds and Core 2s (duos and quads) didn't. It didn't come back until the i7.
The other reason is that it is useful now. When HT first came out, it was pretty much for desktop chips and we were still very much a single core world. Ok well little was designed to truly take advantage of multiple threads in that environment. People noticed no real speedup. However now not only are things better using multiple cores, but the server m
Re: (Score:2)
Crapware (Score:2, Funny)
Now I can run all my crapware, viruses, trojans, malware, and other dubious software bits at FULL SPEED! Yay
Flash (Score:3, Funny)
Better than that, with a properly multi-threaded web browser we'll be able to display sixteen animated Flash ads simultaneously with no slowdown!
AMD's competitor, Socket G32 (Score:5, Interesting)
http://blogs.amd.com/work/2010/02/22/magny-cours-is-right-on-schedule-and-shipping-to-customers/ [amd.com]
Joy! I hope it comes with ... (Score:3, Funny)
... super cool looking white plastic mold which fits my sochet and cool looking notepad!
Somebody's gotta ask... (Score:5, Funny)
So, how soon until newegg.com has the fake ones in stock?
Why the same thing we do every night... (Score:2)
try and take over the world!
Re: (Score:3, Informative)
This processor is meant for servers, because they're Xeon, and with all the Web 2.0 and Cloud computing going on, servers are always hungry for more power.
Re:It's obvious (Score:5, Insightful)
10 years ago if you had told me about an 8-core processor I would have imagined using it for kick-of-the-ass games, immersive virtual reality, editing 3D video and simulating newer, more deadly designs of chainsaw chain.
But noo, instead they are used to pump out inefficient JavaShit-based versions of the Desktop software we had in '93 with a shiny new rounded corner interface to web browsers around the world. Great.
Re: (Score:3, Insightful)
Yea, it really bugs me how 95% of a web site's load time and processing load is accounted for by a few pretty features like rounded corners and drop shadows.
How about we put those effects into CSS where they below and not induce massive load by simulating them with 5mb of JavaScript?
Re: (Score:3, Insightful)
Web 1.0 can use plenty of cores, too, but generally your Web x.x requirements and your required server core count are orthogonal. Bandwidth and latency requirements for Web 2.0 are a different story, though. Those things tend to scale depending on how shitty your programmers are.
Re:It's obvious (Score:4, Funny)
Who is talking about servers? I'm thinking about my home machines, you know, where the client-side javascript runs...
Re: (Score:2)
Re: (Score:2)
MrNaz was quite clearly talking about clients. If you're new here then you might not have realized it yet, but sometimes discussions actually evolve over the course of several comments.
Tell it to MS? (Score:2)
Rounded corners are part of CSS3 and Webkit and Mozilla support it.
Oh and even the largest javascript libraries come in at 100kb, so where do you get 5mb from? Went trolling for it perhaps?
So, next time you complain, check your facts and use a real browser. Not the joke browser that came with your joke OS on your dad's Dell.
Re: (Score:2)
It sucks that the implementation is browser-specific, but hey! It's still there.
Re: (Score:2)
I thought the obvious answer would be octa-porn.
And you're free to interpret that any way you like.
This thing IS a Beowolf cluster (Score:2)
all by itself
Re: (Score:2)
Apache is popular. MySQL is popular. Pretty much any Web or DB server will eat these right up.
Re: (Score:2)
You're usually better off spending money on spindles and ram and raid controllers way before exotic multicore CPUs, on a database machine.
Re: (Score:3, Informative)
Not with an X25-E.
Re: (Score:2, Insightful)
I am sure there are plenty of applications out there that can take advantage of this new hardware. I run finite element and computational fluid dynamics software at work and both are capable of using the 8 cores in my work PC (dual quad core).
The really sad part though is that for the FEA software I can only use 2 cores because the vendor requires customers to buy a separate HPC license for every processor/core beyond 2.
Re: (Score:2, Insightful)
Don't know about games, but many types of numerical processing can easily take advantage of this. ATLAS and other high-performance linear algebra libraries already use all available cores (no, IO is often not the biggest bottleneck with these libraries, as they seem to squeeze out all possible advantages from the L1 / L2 caches). In other words, for my scientific computations, I would definitely notice a difference.
Also, OpenMP is becoming easier and easier to use with recent gcc releases, and it only tak
Re: (Score:2)
Ever heard of a little thing called visualization? :) Even in the single-host space, there is plenty of software with a high degree of parallelism and horizontal scalability. We have several types of servers that run hundreds (and in some cases) thousands of threads or processes.
Re: (Score:2)
How long before game makers right code that supports this new chip?
Answer: a very long time. This is a xeon part designed for large database servers. It isn't intended for desktops. Some fool might try to put it into a gaming rig eventually, but that person...really will be worth of the title "fool". That would be like putting an engine designed for a freight train into a ferrari.
When will other software vendors have code that supports this many cores?
Answer: they already do. the companies that write
Re: (Score:2)
Re: (Score:3, Insightful)
People have been arguing as you are that x86's bloated CISC instruction set was inferior to a cleaner RISC architecture for the last 20+ years. Nobody has ever proven that the elegance of the instruction set matters with hard data though.
What evidence we do have goes against that argument.
Apple machines used a cleaner RISC architecture for a while in the desktop space. They never performed any better than equivalent x86 based machines, and in the end Apple abandoned RISC and moved to x86.
Intel came out wi
Re: (Score:3, Insightful)
People have been arguing as you are that x86's bloated CISC instruction set was inferior to a cleaner RISC architecture for the last 20+ years. Nobody has ever proven that the elegance of the instruction set matters with hard data though.
What evidence we do have goes against that argument.
The only evidence that we have is that the benefits of commoditization and economies of scale often outweigh any architectural advantages. The fact that x86 incorporated many elements of RISC would also demonstrate its value.
Apple machines used a cleaner RISC architecture for a while in the desktop space. They never performed any better than equivalent x86 based machines, and in the end Apple abandoned RISC and moved to x86.
Manufacturing processes simply trumped architectural differences. PowerPC's have never been manufactured on anywhere near the scale of x86.
Intel came out with a cleaner RISC based instruction set that that the Itanium line uses. If x86 was really as bad as you say, Itanium chips would be running circles around the x86 based server chips provided by both Intel and AMD. That isn't happenning.
Itanium is EPIC, not CISC. It is the exact opposite of RISC. It may not be running circles around x86, but that may be due to compilers not yet bein
Re: (Score:2)
I 3 x86 (Score:2)
Theoretically... EPIC > RISC > CISC... ...but it doesn't matter much in the real world. The small theoretical speed advantages get lost in the noise of compiler quality, product cycles, bus speed, programmer skill, etc. Underneath, the execution units are similar, and any brilliant breakthrough in EPIC compiler tech will just be copied into the x86 instruction decoder of x86 chips.
So basically, x86 is here to stay for a LOOOOOOOONG time.
Re: (Score:2)
The problem you are eluding to as I see it is not that the memory architecture does not serve the instruction sets. The problem is that latencies from the core to memory are very long and writing code that is tolerant of those kinds of latencies is either impossible (or impossibly difficult) to do for the majority of typical (desktop in particular) applications. Any code that has a lot of branches (think: any software you write that has a lot of "if" or "switch" statements) invariable ends up reaching th
Re: (Score:2)
Re: (Score:2)
I don't see anyone talking about the cost of this, any ideas?
If you have to ask, you can't afford it.