Larrabee Based On a Bundle of Old Pentium Chips 286
arcticstoat writes "Intel's Pat Gelsinger recently revealed that Larrabee's 32 IA cores will in fact be based on Intel's ancient P54C architecture, which was last seen in the original Pentium chips, such as the Pentium 75, in the early 1990s. The chip will feature 32 of these cores, which will each feature a 512-bit wide SIMD (single input, multiple data) vector processing unit."
Re:What the hell is Larrabee? (Score:5, Informative)
According to Wikipedia http://en.wikipedia.org/wiki/Larrabee_(GPU) [wikipedia.org]
Re:What the hell is Larrabee? (Score:4, Informative)
SIMD = Single Instruction, Multiple Data (Score:5, Informative)
Get your acronyms right....
Re:I'm no expert but (Score:5, Informative)
Power can come from multiple sources. In this case, you have a 150W power connector (probably a 6pin PCIe one), and another 75W one (yet another 6pin PCIe). The remaining 75W comes from the PCIe connector itself.
Nothing terribly unusual - a number of cards are coming out in configurations like this, and 300W for a video card is starting to become the norm, depressing as it is.
Re:What the hell is Larrabee? (Score:5, Informative)
Not quite...
Larrabee is a general purpose number cruncher with high degree of parallelism.
NVIDIA/ATI are moving towards making their graphics cards capable of running general purpose code. Intel is coming from the other side, moving a general purpose parallel-compute engine towards doing graphics.
Yes it's a subtle difference and yes they'll meet in the middle, it's just a question of angles.
Intel wants the parallel compute market more than it wants the graphics card market so that's who it's pitching this at.
Re:Marko DeBeeste (Score:4, Informative)
I can't believe it took this long for someone to find the "Get Smart!" reference.
Would you believe.... 39 posts?
How about 20?
How about one FRIST POST and an In Soviet Russia?
Re:What the hell is Larrabee? (Score:1, Informative)
Uh, I think you're talking about LARAMIE, not Larrabee.
Re:Pentium 75? (Score:5, Informative)
I don't care if you're a C64 fanboi, Pentiums made mistakes. Apple had nothing do to with it. Read here [wikipedia.org].
And this also from the same source... "In June 1994, Intel engineers discovered a flaw in the floating-point math subsection of the Pentium microprocessor. Under certain data dependent conditions, low order bits of the result of floating-point division operations would be incorrect, an error that can quickly compound in floating-point operations to much larger errors in subsequent calculations. Intel corrected the error in a future chip revision, but nonetheless declined to disclose it."
Re:The "Core" chips were based on the Pentium III (Score:4, Informative)
Re:I doubt it (Score:5, Informative)
The original Pentium (which went to 166Mhz, at the end, not just 75Mhz), used U and V execution pipes. No translation to micro-ops, and no "out of order". Indeed, there shouldn't be a need for that in Larrabee, anyway, given the number of cores. It would almost be better to get rid of the V pipe, and add SIMD, instead.
Your comments on CISC are bit off-base; the idea is to execute shaders in x86 machine code. They can be simple (limited flow control), or complex (general CPU/GPU).
"out-of-order" (ei. Pentium Pro and better) is not so good with that many cores doing that kind of work. It would get the hardware into a lot of trouble. Better to keep it simple, and add more cores.
A better start point would probably have been ARM, but that would lose the compatibility edge. If Larrabee works, it will take the GP-GPU market by storm. It needs:
1 - to publish itself as an NUMA access CPU (add a bit to tell the OS what it is for)
2 - compiler optimizations for the particular CPU architecture, preferably broken into two pieces:
2a - "straight line" shader code
2b - branching code
3 - a guide to the new NUMA characteristics.
With that in place, a standard (BSD/LINUX) OS will be able to use it for regular jobs. Or, for those special "I need the SIMD unit" jobs. The biggest hassle is trying to split control of those new CPU units between OpenGL and the regular scheduler (this is a kernel hack that Intel will have to make). It would be easier to jam this into OpenSolaris, but that isn't anywhere near popular enough.
Don't you want your video card to assist compiling large source when not gaming/modeling? Why not?
And, a few "extra" points
- Intel already has an optimizing compiler for the P54C architecture, and we have gcc.
- The architecture, including U/V pipelines only used 3.1 million transistors.
- A GeForce 7800 GTX has 302 million transistors -- 100x the number of the original Pentium processor.
So, I would think that using 32 "Pentium Classic" cores reduced would be quite feasible -- you need some (lots) of logic to ensure that they can all access their respective memories. The general SIMD implementation will take quite a bit of real estate as well. There is probably a budget of 600M transistors (wild ass guess) to Larrabee, estimate derived from power consumption estimates.
The gate size shrink should result in higher speeds. There may be a danger in the complex instruction interpretation routines, but these can be corrected. The single cycle instructions are already a (more than less) synchronous design, and should scale trivially.
Anything I am missing?
I, for one, am looking forward to buying a desktop super-computer with Larrabee.
Re:What the hell is Larrabee? (Score:5, Informative)
Meanwhile...
32 x ???MHz (Unknown, but likely to be 900+ to be competitive with current designs) x 3+MIPS/MHZ + 32 x 512-bit SIMD units = OMGWTFHAX @ 300W.
Seriously. The "Pentium" base of this design is damned near irrelevant. At this point, all it's doing there is scheduling execution on the SIMD units. If you've seen any modern GPU designs, they're basically hugely parallel cores attached to a few "director" cores which puts everything where it needs to go. The original Pentium is probably the most powerful CPU with the least complicated design on the process, with the least amount of legacy MMX cruft.
Re:I'm no expert but (Score:3, Informative)
...and 300W for a video card is starting to become the norm, depressing as it is.
Not really, die shrinks have been actually driving down power consumption. If you look at this page: http://www.guru3d.com/article/radeon-hd-4850-and--4870-crossfirex-performance/3 [guru3d.com] you can see that the latest generation Radeon 4850 and 4870 consume much less power than the power hungry peaks set by the 2900XT. The 4850 system uses less than 300W at full load. That's pretty damn impressive considering the ridiculous amount of performance it puts out.
Re:Uh, isn't that true of the Core CPUs too? (Score:2, Informative)
The Intel Core is derived from the P6 architecture, which debuted with the Pentium Pro, not the Pentium. Its history goes: Pentium Pro, Pentium II/Pentium II Celeron/P2 Xeon, Pentium III/Pentium III Celeron/P3 Xeon, skip the Pentium 4 (Netburst architecture), Pentium M, Intel Core. So, this is still interesting news.
Internet telephone game run amok, Slashdot helping (Score:5, Informative)
http://babelfish.yahoo.com/translate_url?doit=done&tt=url&intl=1&fr=bf-home&trurl=http%3A%2F%2Fwww.heise.de%2Fct%2F08%2F15%2F022%2F&lp=de_en&btnTrUrl=Translate [yahoo.com]
Actually, they got the "Gelsinger said so" remark from Expreview, itself a Chinese site:
http://en.expreview.com/2008/07/07/larrabee-unleashes-2-tflops-capacity [expreview.com] (note they curteously attached the Larrabee board diagram leaked from a while back):
"Gelsinger said the Larrabee will be a 45nm product featuring SIMD technique, 64-bit address. Besides, 32 of cores runing at 2.00 GHz will unleash 2 TFLOPS capacity, twice as much as the RV770XT."
But did Gelsinger really SAID those things?
Here is the Google translation of the same Heise article: http://translate.google.com/translate?u=http%3A%2F%2Fwww.heise.de%2Fct%2F08%2F15%2F022%2F&hl=en&ie=UTF8&sl=de&tl=en [google.com]
It seems that no matter which crappily translated version of the German article one looks at, it appears that Gelsinger said no such thing... The part about Larrabee containing P54C cores was clearly in a separate paragraph, written after a speculative question.
So I guess Expreview THOUGHT Pat said something after it took a too-short of a look at the Heise article, after which CustomPC sensationalized the whole thing, not really bothering to actually read even the translated link it posted. Now, some random Slashdotter is doing the same curtesy.
There you go, folks- Internet reporting.
Re:I doubt it (Score:3, Informative)
I know what you were saying, but for the benefit of the general audience:
That works better if all the geometries scale linearly (line separation, aspect ratios, layer thicknesses, etc). As a general rule, that changes slightly from one generation to another, but there are often significant changes.
And going from 0.6u to 0.35u to 0.25u to 0.18u to 0.13u to 90 nm to 65 nm to 45 nm is a few too many steps for that assumption to work....
Particularly given that modern chip photomasks are a completely different phase-shift tech than the older ones. You couldn't size down older masks to new process at all.
Back to your main point, on why use P54 anyways... My guess is that they really want to kickstart their many-many-core work with this and walked back along their product line until they came to something with enough features, few enough transistors, and modern enough logic model / HDL or Verilog code that they could have a fair chance of translating and resynthesizing it rapidly.
But that's stretching the available leak knowledge a ways. Someone will eventually go on record with the real details.
Re:Compare with Niagara 2 and 3, and Cell (Score:3, Informative)
The big architectural difference with the CELL SPU's is that SPU's really are not meant to directly access system memory. Each SPU has a very limited local memory buffer it can directly access. System memory can be modelled as a RAM DISK and accesses to system memory are through a DMA that can be considered the equivalent to an asynchronous file read/write using the RAM DISK analogy.
Re:What the hell is Larrabee? (Score:3, Informative)
Re:Pentium 75? (Score:5, Informative)
It wasn't every time you divided. It only affected floating point operations, and Intel claims that only 1 in every 8.77 billion random divisions will show the error, and those familiar with the bug agree that Intel's analysis is more or less correct. That would explain how it got through the initial testing by Intel and that the bug wasn't noticed for a while by the general computing public. The whole thing was more of a PR disaster on Intel's part than anything else.
Re:What the hell is Larrabee? (Score:3, Informative)
I don't know why you'd suspect a Dvorak keyboard. The # sign isn't moved at all, and it's really not close to the apostrophe at all.
For a Dvorak keyboard, you look for words spelled correctly, but which make no sense in context... Happens a LOT, since all vowels are directly adjacent.
ie. "It's very hat outside"
Re:Pentium 75? (Score:3, Informative)
Re:What the hell is Larrabee? (Score:3, Informative)
Another possibility, since this kind of chip is generally running the same program on all of the cores, is to have a single decoder and a shared instruction cache that caches micro-ops.
Re:I doubt it (Score:4, Informative)
don't forget that the Pentium M and, subsequently, Core line of processors was based on Pentium III Coppermine, whereas the Pentium 4 Netburst architecture developed in the meantime was abandoned completely
This keeps being repeated, but is simply not true. The Core 2 is a completely new microarchitecture, and so doesn't count in this discussion, while the Core 1 is essentially almost identical to the Pentium M. The Pentium M, however, is not just a tweaked P3 with Netburst completely abandoned. It has a slightly longer pipeline than the P3, and it takes several important features from the Netburst architecture, including (but not limited to) the floating point and vector pipelines and the branch predictor. The Pentium M took the best parts from the P3 and P4 architectures - it didn't just throw one away.
Re:What the hell is Larrabee? (Score:3, Informative)
No it wasn't. The later Pentium Pro [wikipedia.org] was the first Intel processor to use this method. The Nexgen Nx586 was the first ever (for x86 at least). AMD bought Nexgen and used them to create the K5 (launched slightly after the PPro).
On which scale.... (Score:4, Informative)
It's mainly a question of "on which scale are we comparing chips".
Yes, x86 instruction set is utterly ugly and horribly contrived, compared to nice contemporary architectures like 68k. Computing would probably be filled with less hoops had IBM decided to go with Motorolas for their PCs (as lot of other home computers or arcade and home console have done).
*BUT*
if we place GPUs on the same scale, suddenly the x86 shines : it doesn't completely suck at branching, and has an actual stack that can be used to call sub procedures, has interrupts, etc.
It is an architecture able to run an OS.
nVidia CUDA machine on the other hand, mainly use SIMD-masking for most conditional operation, aren't really brilliant when it comes to branching, and completely lack any way to do sub-procedures. Those chips have loads of register. But instead of using them to do register windows and do RISC-style sub calls, they use the registers to keep more thread in flight.
It definitely make a lot of sense from a functional point of view (those are GPUs, they are made to processing fuck-loads of pixels per seconds), but this makes them unable to run linux.
On that scale, having x86 on a GPU suddenly makes it a lot interesting for usages outside the usual "draw triangles very fast". Even if x86 sucks to begin with.
And for the record : there's hardly a way that the 68k architecture ever prevailed. It's a good one. But IBM was never seing its PC as anything better than a glorified terminal. For such kind of machine, there were of course going for the cheapest possible chip.
Given a choice between a half assed chip from Intel with a 16bit extension quickly tackled over a design inherited from early 8bit chips (8008, 8080 and concurrent Zx80 - most assembler code can be directly recompiler on 8088 after a few register renaming) AND a very nice chip from Motorola redesigned from the ground up to be a nice and clean 16/32 bits architecture designed for future expension :
Of course they will pick the Intel. It's cheaper and there's no need for a future proof 32bits processor in a fucking "Terminal Deluxe".
And of course, because of the (relatively) low cost, because of the (very strong) brand recognition, because of the (somewhat) openness of the platform enabling clones (in the sense it was documented. Of course, Phoenix had to completely rewrite the BIOS because of copyright restrictions - but IBM considered Big Irons being they main products and didn't mind such clones), and because they were takin a relatively uncrowded market (most home computers were for homes, school, and small shops - PC were marketed for corporations) :
The PC was bound to take over the market very quickly - *with* its bad design (almost *because* of it). And was bound to set the standard, as bad this standard is.
And by then, it was too late for IBM to take a better architecture to produce a "Terminal Deluxe Pro Mark-III" with a clean 68k chip.
Of course, had the PC had a less crippled OS, designed to be slightly more extensible and making less assumption about the architecture than MS-DOS (you know the "we laid everything around 1MiB and though it would last for at least 10 years" by mr. Gates), perhaps a switch to a better different architecture could have been less painful, and a cleaner architecture could have blessed the PC world sooner.