Larrabee Based On a Bundle of Old Pentium Chips 286
arcticstoat writes "Intel's Pat Gelsinger recently revealed that Larrabee's 32 IA cores will in fact be based on Intel's ancient P54C architecture, which was last seen in the original Pentium chips, such as the Pentium 75, in the early 1990s. The chip will feature 32 of these cores, which will each feature a 512-bit wide SIMD (single input, multiple data) vector processing unit."
I doubt it (Score:5, Interesting)
I doubt it. Maybe they mentioned the Pentium as an example to explain an in-order superscalar architecture as opposed to more modern CPUS.
-There is a lot of overheard in the P54C to execute complex CISC operations that are completely useless for graphic acceleration.
-The P54C was manufactured in a 0.6micron BiCMOS process. Shrinking this to 0.045micron CMOS (more than 100x smaller!) would require a serious redesign up to the RTL level. Circuit design had evolve with process technology.
-a lot more...
Manycore GPU (Score:5, Interesting)
Larrabee [wikipedia.org] is going to be Intel's next creation in the GPU world. A many core GPU which has the following peculiarities :
- fully compatible with x86 instruction set. (whereas other GPU use different architecture, and often instruction sets that aren't as much adapted to run general computing).
Thus, the Larrabee could *also* be used as a many core main processor (if popped into a quick path socket) and used to execute a good multicore OS. Something that's not achievable with any current GPU (both ATI's and nVidia's completely lack some control structures - both are unable to use subroutines and everything must be in-lined at compile time)
- unlike most current Intel x86 CPUs, features a shallow pipeline, executing instruction in-order. Hence, the Larrabee (and the Silverthorne which also have such characteristics) are regularly compared with old Pentiums (which also share those characteristics) since the initial announcement and including in TFA.
- feature more cores with narrower SIMD : 32 cores able each to handle 16 32bit float simultaneously. Whereas, for exemple nVidia's CUDA-compatible GPU have up to 16 cores only, but each able to execute 32 threads over 4 cycles and keep up to 768 threads in flight.
This enable Larrabee to cope with slightly more divergent code than traditional GPUs and make it a good candidate to run stuf like GPU accelerated RayTracing.
Hence all the recent technical demos running Quake 4 in raytracing mentionned on /.
That's for what Intel tells you.
Now the old and experienced geek will also notice that Intel has only kept making press releases and technical demo running on plain regular multi-chip multi-core Intel Cores (just promising that the real chip will be even better than the demoed stuff).
Meanwhile, ATI and nVidia are churning new "half"-generations each 6 months.
And the whole Larrabee is starting to sound like a big vaporware.
Re:I doubt it (Score:4, Interesting)
It's unlikely but not impossible - don't forget that the Pentium M and, subsequently, Core line of processors was based on Pentium III Coppermine, whereas the Pentium 4 Netburst architecture developed in the meantime was abandoned completely. Going back to Pentium I would be a bit on the extreme, but it's possible that they meant some basic design principles of Pentium I, not the whole core as it was. Maybe they will make something from scratch, but keep it similar to the original Pentium's inner RISC core, or maybe redo it as a vector processor or hell knows what. It was a citation from a translated interview with some press monkey, so you can expect anything.
Re:I doubt it (Score:4, Interesting)
Interesting choice... (Score:3, Interesting)
If anyone remembers those old original Pentiums, their 16-bit processing sucked - so much that a similarly clocked 486 could outperform them. I guess that it would be reasonably trivial for Intel to slice off the 16bit microcode on this old chip to make a 'pure' 32-bit only processor. I am sure that they will be using the designs with a working FPU... but for many visual operations, occasional maths errors would largely go unnoticed. Remember when some graphics chip vendors were cheating on benchmarks by reducing the quality ... and how long it took for people to notice?
Although, if I had Intel's resources and was designing a 32-core cpu, I would probably choose the core from the latter 486 chips... I don't think a graphics pipeline processor would benefit much from the Pentium's dual instruction pipelines and I doubt that it would be worth the silicon realestate. The 486 has all the same important instructions useful for multi-core work - the CMPXCHG instruction debuted on the 486.
Re:I doubt it (Score:4, Interesting)
One does not "shrink" a chip by taking photomasks and shrinkenating. One redoes the design / layout process, generally. The P5 series went from 0.8 um to 0.25 um over its lifetime (through Tillamook), stepping through 0.6, 0.35, and finally 0.25 um.
It was 148 mm^2 at 0.6 um, so the process shrink should bring it down to a floorplan of around a square millimeter or so a core. Not sure how big the die will be for Larrabee, but the extra space will probably support the simple wide data unit per core and more cache. If the SIMD is simple it could be another 3-4 million transistors / 1 square mm or so. For a 100 mm^2 chip that gives you another 30 mm^2 or so for I/O and cache (either shared, or parceled out to the cores).
Compare with Niagara 2 and 3, and Cell (Score:3, Interesting)
Right. It clearly isn't using the Pentium design, but a Pentium-like design.
To that, they will have added SMT, because (a) in-order designs adapt to SMT well because they have a lot of pipeline bubbles and (b) there will be a lot of latency in the memory system and SMT helps hide that. I would assume 4 way SMT, but maybe 8. Larrabee will therefore support 128 or 256 hardware threads. nVidia's GT280 supports 768.
The closest chip I can think of right now is Sun's Niagara and Niagara 2 processors, except with a really beefy SIMD unit on each core, and a large number of cores on the die because of 45nm. I think Niagara 3 is going to be a 16 core device with 8 threads/core, can anyone confirm?
Note that this is pretty much what Sony wanted with Cell, but Cell was 2 process shrinks too early. 45nm PowerXCell32 will have 32 SPUs and 2 PPUs (whereas Larrabee looks like it is matching an equivalent of a weak-PPU with each SPU equivalent). It could run at 5GHz too... power/cooling notwithstanding.
I already thought of this.. (Score:3, Interesting)
at least 20 years ago, I thought, hey, with the density and speed of transistors these days, and with RISC being popular, why not go all the way and make chip with literally hundreds of (wait for it..) Z80 cpu's?
Of course I and others dismissed the idea as being just slightly ludicrous. But then, at the time, I also thought eventually there would be Amiga emulators and interpreted versions of C language, for which I was also called crazy to think...
Re:I doubt it (Score:2, Interesting)
Actually, I used to work at Intel (around the time of 0.6um) and one could, and indeed, did sometimes shrink chips just by "shrinkenating", or perhaps shrinkenating followed by a design rule check. The result was a chip that was cheaper to manufacture, and in most cases, ran faster.
Of course, to really take advantage of the smaller process node, one could revisit the cell library, circuit design, and logic, or any subset of the above, depending on what you were after. Often, time was of the essence, so you didn't do everything possible.
I was not on the Pentium team, but I'd guess the P54 logic model was written in iHDL, which would mean that getting it through a modern synthesizer like Physical Compiler would require first converting it to Verilog. (They probably have a translator now.) But to get an efficient result, some serious changes to the RTL would almost certainly be required. Because wire delay is much more important in 0.045um than 0.6um, the analysis of what work can be done "close by" or "far way" within a clock cycle will be quite a bit different.
The real question is, if this core is going to spend 98% of its time cracking away with its super-sexy SIMD FP unit, why are they bothering with x86 cores anyway rather than something slimmer? It's not like they need to boot windows -- I hope.
Re:What the hell is Larrabee? (Score:2, Interesting)
Yes, but not as badly as you might think (Score:4, Interesting)
According to the diagram in the article, the Larrabee has 8 GDDR memory interfaces, which will supply rather a lot of bandwidth. Presumably, those are GDDR4 or GDDR5 interfaces, so that's 4.5 Gb/s * 8 = 4.5 GB/s bandwidth.
Getting data onto and off the board will still be a challenge - you're limited by PCI Express transfers.
Re:I already thought of this.. (Score:2, Interesting)
Re:Why Not Atom? (Score:2, Interesting)
Re:What the hell is Larrabee? (Score:3, Interesting)
What I'm confused about: Around 40% I believe of the original Pentium was x86 translation layer.. it was the first chip to use a RISC-like internal setup. Nowadays that percentage is way lower since the rest of the chip has gotten all the new transistors. Is this chip going to have 32 x86 translation units?
Re:I already thought of this.. (Score:3, Interesting)
You know, I was actually going to note that in my post. Yep, the Z80 is probably the antithesis of RISC at the time. It had a lot of instructions for the day. I dont think any instruction was less than 4 clock cycles, and many or most were more than 2 of these 4 clock cycles (for 8 or more total clock ticks). If I remember right.
Much more risc like would have been the 6502 or something. But then they had few internal registers, where the Z80 had lots... and I think RISC designs all have lots of registers.
I figured the Z80 would work better in such an extremely high core count device for that reason. The 6502 needed a lot more memory accesses to get things done.
Of course, the final conclusion to this line of thinking was, how simple of a core could you possibly make? And then how many could you get into a modern chip?
I don't remember the name but there were some people that made a many-core cpu of processors that ran forth as their language. That was interesting...
Re:Pentium 75? (Score:3, Interesting)
Re:I doubt it (Score:2, Interesting)
He was talking about the Processor, not the Process. While it's nice to know Intel is resuscitating an old processor from the boneyards, the process to be used will be nothing like the original process. Nowadays we're printing at 45nm equivalent gatewidths.
The interesting part is that Intel is going to be doing a mashup of a grunch of old processors for parallel processing. Each of these sub-processors are going to make an Atom look massive, but collectively (with appropriate programming) they should be quite cool.