Intel Launches 72-Core Knight's Landing Xeon Phi Supercomputer Chip (hothardware.com) 179
MojoKid writes: Intel announced a new version of their Xeon Phi line-up today, otherwise known as Knight's Landing. Whatever you want to call it, the pre-production chip is a 72-core coprocessor solution manufactured on a 14nm process with 3D Tri-Gate transistors. The family of coprocessors is built around Intel's MIC (Many Integrated Core) architecture which itself is part of a larger PCI-E add-in card solution for supercomputing applications. Knight's Landing succeeds the current version of Xeon Phi, codenamed Knight's Corner, which has up to 61 cores. The new Knight's Landing chip ups the ante with double-precision performance exceeding 3 teraflops and over 8 teraflops of single-precision performance. It also has 16GB of on-package MCDRAM memory, which Intel says is five times more power efficient as GDDR5 and three times as dense.
LOL ... Crikey ... (Score:5, Insightful)
So, somewhere someone at AMD is going "fuck it, we're going to 128 cores".
Damn ... that's a crap pile of cores ... that's like, Skynet in a box or something.
The mind reels.
Re: (Score:3)
Yep, but it'll be 128 integer cores with 64 floating-point cores, and someone will take them to court over it... because... butthurt.
Re: (Score:2)
Re: (Score:2)
That should be enough for many entry level systems, and even some light gaming, well with the right graphics card...
NVIDIA GTX 750 Ti is the perfect match.
Re: (Score:2)
Damn ... that's a crap pile of cores
IMHO the only metrics that aren't subjective are transistor count and process tech...
Re: (Score:2)
How about the Mach 20?
https://www.youtube.com/watch?v=2FAP8o5ZEo0
Re: (Score:2, Insightful)
You are truly sick. Get some help. You, and the one who modded you insightful.
All I did was provide a useful link. And you go nuclear-fractal about it, exploding with invective and unsubstantiated speculation.
Re: (Score:2)
I'm pretty sure that Anonymous Coward's response was itself propagating some other kind of meme, but I'm not motivated enough to look it up.
Anyway, the thing is to take inappropriately over-the-top invective with a grain of salt, since it was probably intended to be tongue-in-cheek.
Re: (Score:2)
I am waiting for the Mach30TurboLazer. Call me a Luddite. :)
Re: (Score:2)
http://starwars.wikia.com/wiki... [wikia.com]
Are you sure you want to shave with one though, that might hurt...
Re: (Score:2)
It would certainly be a close shave. :)
Re: (Score:2)
It would give a whole new meaning to razor burn.
Re: (Score:2)
Re: (Score:2)
Liberal detected.
Re: (Score:2)
Um, I am not sure you understand the scale of this product. They are fitting 72 cores and 16GB or ram into approximately 2inch by half inch. Think this:
http://www.newegg.com/Product/... [newegg.com]
Not the size of a blade. This is a tiny thing that could go in a laptop as a co processor for who knows what.
Cool! (Score:2)
Re: (Score:2)
Re: (Score:2)
Re:Cool! (Score:4, Funny)
Bah, turn on Flash and IE and it'll be 2 minutes and glowing white. ;-)
Re: (Score:2)
Don't worry, I read it has a power saving mode and switches to 2 core while on batteries making it more efficient than a celeron.
Imagine a Beowulf Cluster of these... (Score:2)
Just saying
Re:Imagine a Beowulf Cluster of these... (Score:5, Funny)
That chip is a Beowulf cluster.
Re: (Score:2)
Then, that would make a Beowulf cluster of Beowulf clusters.
Re: (Score:3)
A meta-beowulf.
The video is complete drivel (Score:3)
It is probably a good chip for it's niche, so you would think they would have less bloviation in their intro video. If this was anyone else I would assume they were mostly trying to fleece more investors before they inevitably went belly up. It's so bad that major league sports style animation with yelling pitchman and a pounding beat would be an improvement. That bad.
Re: (Score:2, Informative)
Better article [anandtech.com].
Also summery is wrong: its on the 14nm process (the previous gen one was 22).
Really the memory looks like the only interesting thing here.
Re: (Score:2)
Application? (Score:2)
Re: (Score:2)
They pretty much all are, Los Alamos, Blue Waters, Oak Ridge National Laboratory, all the medium to long term weather forecasting I know of, etc.
Generally 10,000 to 100,000+ nodes.
Re: (Score:2)
Re: (Score:2)
The Japanese K built by Fujitsu uses Sparc64.
Re: (Score:3)
raytrace version of wolfenstein, pretty much.
cost effectivity for other uses.. well..
Re:Application? (Score:5, Funny)
Bitcoin mining, of course -- it may not be as fast as a similarly-priced GPU farm, but the coins it creates will be of the highest possible quality and workmanship.
Re: (Score:2)
For applications when a working dataset is small enough that you can fit it on these cards they are apparently very good. If you need to shift a lot of stuff in from main memory on frequent occasions they are not and the AMD systems hooked together with infiniband look a lot better. For things that benefit from a huge amount of shared memory (2TB plus onboard and 160 cor
Re: (Score:2)
Does it execute x86 code? Does it support virtualization? I guess you could use it then to host lots of Linux VMs.
Re: (Score:2)
It does execute x86. However, I'm pretty sure that the VM hypervisor would need to be tailored to use these, and your memory bandwidth is severely reduced because it sits on a PCI-e link.
These things are made for the same workloads that people use CUDA and OpenCL for. Seriously parallel processing with small-ish data sets.
Re: (Score:3)
So what exactly is the real world application of such a beast?
All of the things where you really, really wish that you could do GPU offloading, but can't because you have diverging flow control and the GPU version ends up coming nowhere near to the theoretical peak performance of the hardware. The Xeon Phi cores are pretty simple, but there are loads of them, they have real branch prediction and caches (so handle the same kind of workloads as normal CPU cores, just a bit slower) and have fairly beefy vector units (so when they're running in a straight line they're ac
Re: (Score:2)
The architecture isn't really that different from a GPU, whatever Intel might try to make you believe. It has 512 bit vectors, compared to 1024 bit vectors on NVIDIA, so it's slightly less hurt by divergent flow control, but only slightly. The theoretical maximum teraflops (8 for single, 3 for double), are pretty similar to what NVIDIA is claiming for the just announced M40.
And don't forget, Intel massively hyped the first generation of MIC, and it then turned out to be next to worthless. Hopefully they'
Re: (Score:2)
So what exactly is the real world application of such a beast? Are there that many x64 based supercomputers out there?
The two fastest supercomputers in the world are x86_64 based, as are in fact all but three of the top ten.
Re: (Score:2)
" Are there that many x64 based supercomputers out there?"
Yes.
Including the number one and number two on the top 500 list.
3Tflops (Score:2)
I guess AMD gave up on HPC. If I read the wiki right, their top card does 0.1Tflops
Re: (Score:3)
Re: (Score:2)
Re: (Score:2)
Re: (Score:3)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
You can't explain it?
You don't know?
Note to posters - please do not counter specific examples with a gut feeling - it makes you look like an idiot.
Re: (Score:2)
Did someone urinate in your cereal this morning?
Re: (Score:2)
So please explain why Transmeta didn't take off despite aiming directly for that metric
and good performance, there was no reason to buy Transmeta processors. Note that they're not actually dead: nVidia bought Transmeta and used their ideas in their Project Denver ARM SoCs, which are selling pretty well now, in a different market where performance-per-Watt does matter.
and why there are so many power hungry Xeons out there.
There aren't. Xeons are so popular precisely because they give you the best performance within a given power envelope that you can currently buy (unless you're willing to go with custom accelerators or less general cores such a
Re: (Score:2)
They are already popular despite that not happening yet. Wrong guess. Maybe try something other than a guess next time?
Re: (Score:2)
They are already popular despite that not happening yet. Wrong guess. Maybe try something other than a guess next time?
Where are you getting your numbers from? That's why we bought them, and it's why the companies that I talk to who buy them in lots of a thousand buy them. In the P4 days, we were buying Opterons almost exclusively. I think it's been five years since we last bought one.
Re: (Score:2)
It may matter a lot, but it's not the most important metric.
If you're building a supercomputer, you're building it to calculate shit, and to calculate it as accurately as possible, as fast as possible. You design the computer first, and then the facility to house it after the design is done.
Someone dropping tens of millions on a super computer isn't going to say "well, we already have this room here that can handle X watts of heat, so design your computer to simulate global weather patterns / thermonuclear
Re: (Score:2)
Re: (Score:2)
nvidia is pumping out 2.9 TFlop DP on their K80 (on paper). Of course on paper the numbers are as good as imaginary (across the board, Rpeak has been more and more a fantasy over time).
Re: (Score:2)
You must be looking at one of the low end NVIDIA GPUs. Tesla K40 gets 4.29 teraflops. Tesla M40 (just announced last week), supposedly gets 7.
Re: (Score:2)
McRAM? (Score:2)
McRAM?
Yes, I would fries with that.
Re: (Score:2)
Yes, I would fries with that.
Fries? I think a bad puppy like this could deep-fry a whole turkey for you.
CISC? (Score:3)
I've been asleep for 20 years so I guess CISC won?
Re:CISC? (Score:4, Informative)
Kind of. The advantages of RISC faded pretty fast. The footprint of a decoder between something like x86 and say, ARM is really not that much, and a decoder is just a small part of a core these days. Clock speed is an issue of thermal footprint. So, all the disadvantages of the x86 (and it's x64 extensions) faded in the face of Intel's focus on process improvements. In the end, not even the Itanium could eek out enough of a win to dethrone the x86 architecture.
Re: (Score:2)
Neither CISC or RISC won.
Data-driven design won out over faith-based instruction set architecture.
Re: (Score:2)
I would not call it faith ;) There where compelling reasons that once all processors where CISC.
Considering that x86 is the only majour CISC processor left, and is translating internally CISC instructions into sets of RISC instructions before they get executed and considering that everything else, that is bigger than an 8 or 16 bit micro controller is RISC (Arm, Mips etc.) I would say: RISC has won.
Re: (Score:2)
No x86/x64 won.
Re: (Score:2)
The complexity of the instruction set matters very little when you can just cache the decoded instructions in the processor. Intel solved that with Pentium Pro in 1995. By using ever-decreasing fabrication processes, they have die space to heap tons of cache in there - I think the current Xeons are somewhere around 2MB/core of cache...
So your 20 year nap is just about right.
Re: (Score:2)
Re: (Score:2)
Waiting for "kilocore" to be a common measure... (Score:2)
...but I suppose 640 kilocores should be enough for anybody.
Knight's Landing, Knight's Corner (Score:2)
So how fast can it calculate a Knight's Tour [wikipedia.org]
Will it play Crysis? (Score:2)
Or be able to load Windows 11?
Excellent! (Score:2)
Still just 4 cores for the desktop... (Score:2)
I am still annoyed that Skylake still only comes with 4 meager cores and some lousy graphics I will mever make use of, and anything beyond that is a hockey stick price increase. Taunting us with 72 is just cruel.
Re: (Score:2)
It still won't allow to have a baby in a month.
Re: (Score:2)
Have you tried having nine wives?
Re: (Score:2)
They also don't plan to have a desktop / same socket intel Xeon chip with out build in video. For the last gen you can get a 4 core + HT chip for about $100 less then a i7.
Re: (Score:2)
I would imagine that the built-in video is actually wanted in the Xeon line, so you don't have to waste motherboard real estate adding a crappy video chip to the bill of materials.
Many, if not practically every, server uses on-board video. Unless they run completely headless.
Re: (Score:2)
But an 16-32MB video chip with it's own ram is better then eating system ram and it can be an issue in multi cpu systems.
News for (computer architecture) nerds... (Score:4, Interesting)
While supercomputing is a very small section of the computing world, it's not that hard to understand.
First of all, this would make for a terrible graphics card. This (deliberately) sits between a CPU and GPU. Each core in a Phi has more branching support, memory space, more complex instructions, etc than a GPU core, but is still more limited than a Xeon core (but it has wider SIMD paths).
A GPU has many more cores that have a much more limited set of operations, which is what is needed for rapid graphics render. But, those limited sets of operations can also very useful in scientific computing.
I haven't seen anybody try a three pronged approach (CPU/Phi/Nvidia Tesla), but I will admit I didn't look very hard. This is all in the name of solving really big problems.
If only the software... (Score:2)
I have eight (8) cores on my laptop. Frequently, a single multiprocessor-unaware application will hog an entire core, getting it hot, while asking nothing of the other seven (7). These applications are typically very expensive ones, so you might think that they would make use of them.
Oh, but no. Give me two cores, 100 cores, or anywhere in between. I, as a power-user, will actually never notice a difference.
Get the programmers to write MPA software. Only then will I think about believing the hype about
Re: (Score:3)
Re: (Score:2)
That is what happens when u use Windows.
Actually, this is what happens when you (I) use Adobe products.
The open-source Image-J is far more agile in processing my 100,000+ image-stacks.
Re: (Score:2)
Uhhhh, this is what happens when you use any application written in C, C++, Java, C#, PHP, Python or just about any programming language without adding threading code. The OS is irrelevant.
Re: (Score:2)
While that is very annoying at least the OS switches it over to another core every now and again to avoid overheating, as a process monitor like "gkrellm" with show you.
Give them a break, developers are only just getting their teeth into 64 bit and you want them to write stuff as if it's 1999? Please give them at least twenty years to get used to the hardware :)
Re: (Score:2)
The good news is that a Xeon Phi isn't ever going to be installed anywhere but a data center, so you don't have to worry about it. It will churn through data sets by running an application specifically written for it.
This isn't a high-volume product for Intel - they probably have a couple hundred customers that use these things. But when they do use them, they use a LOT of them because they are building supercomputers that have thousands of cores.
Re: (Score:2)
The imaginary cores do work, getting +30% out of them is ordinary even in games nowadays.
Re: (Score:2)
Re: (Score:2)
Not enough memory for Minecraft.
Not actually that impressive.... (Score:2)
So the product is Intel's not quite released compute accelerator, featuring new micro architecture, memory technology, and using the latest chip fab capabilities.
The most readily available competition with released numbers is an nVidia K80, a year old product using 5 year old memory technology, 5 year old chip fab capabilities, Set to be superseded by their refresh using state of the art fab, memory, and microarchitecture, which would actually compete toe to toe with what Intel announced.
This *should* make
Re: (Score:2)
GPU floating point performance has been leading general purpose x86 CPU floating point performance by an order of magnitude - for many many years now. There's nothing new in what you are saying.
What is indeed new is that this is the first general purpose x86 based solution that gives you similar floating point performance as a graphics card. And you get all the advantages of the general purpose CPUs as well as all the x86 codebase you might want to support.
There must also be a reason why the number 1 superc
Re: (Score:2)
Yeah, Nvidia can compete toe-to-toe with their next-gen product, until a branch comes along. Branching on GPU compute is ridiculously expensive. This is not so with Xeon Phi.
That's where this product makes sense.
Re: (Score:2)
"WTF?"
Re: (Score:2)
If Windows is now fast enough, Windows 11 will be out and this CPU will be minimum requirements.
Re:Why Intel doesn't utilize the latest node on it (Score:5, Interesting)
Defects in the process bleeding edge process are the main reason to use the older process. When they make one of these insane multi-core parts the die size is very large (sometimes taking up a whole 26 by 32mm scanner field) thus the yields are hit harder by defects. On a more consumer level chip they may have 4 or more die in a scanner field. A single defect in this field will take out one of the four die resulting a a yield of 75% for that field. However in the case of a single die for the whole field the yield would be zero with the exact same number of defects per mm^2. I am sure they have a greater understanding of where their defects come from on the older 22nm process these days and can ensure good yields even with a huge die size.
An additional reason they would use the older process is a chip of this level of complexity probably requires tighter overlay and critical dimension (CD) control than the "standard" 22nm process to work well. Having a well defined process makes tuning all of these factors much easier and it also helps decouple if it was it the process or possibly a issue in the design when initial silicon runs do not work exactly as intended.
Re: (Score:2)
Yeah, my first thought on seeing that monster chunk of silicon: "Defectivity is going to make that thing expensive as hell."
Re: (Score:2)
Are they not using the trick of selling chips with a defective core as a lower core count (like the old Phenom X3). I assumed that was why you get strange numbers like 61 cores.
Re: (Score:3)
Re: (Score:2)
Price for performance?
The chips no matter how small their transistors are still need to fit on a standard die.
To get these chips to run faster you can add more transistors and/or better optimize them for their use.
When you focus on the latter there is lass of a case of running out of space.
If we buy a bigger home we don't buy bigger furniture just more of it.
Re: (Score:3)
Society is getting to the point where technology is literally climbing up our asses,
So you are saying that "big money" has next generation butt plugs?
Re: (Score:2)
> if you aren't a liberal when you're very young you don't have a heart, if you're still a liberal after you grow up you don't have a fucking brain.
*High Five*
I feel exonerated of my youthful days !!
Re: (Score:2)
Non ironic Beowulf clusters.