Forgot your password?
typodupeerror
Intel Hardware

Larrabee Based On a Bundle of Old Pentium Chips 286

Posted by ScuttleMonkey
from the making-old-new-again dept.
arcticstoat writes "Intel's Pat Gelsinger recently revealed that Larrabee's 32 IA cores will in fact be based on Intel's ancient P54C architecture, which was last seen in the original Pentium chips, such as the Pentium 75, in the early 1990s. The chip will feature 32 of these cores, which will each feature a 512-bit wide SIMD (single input, multiple data) vector processing unit."
This discussion has been archived. No new comments can be posted.

Larrabee Based On a Bundle of Old Pentium Chips

Comments Filter:
  • Pentium 75? (Score:5, Funny)

    by Anonymous Coward on Monday July 07, 2008 @05:33PM (#24089949)
    Ah the dreams of the past, a beowulf cluster of old computers come to life :)
    • by Divebus (860563) on Monday July 07, 2008 @05:46PM (#24090177)

      Making math errors at blazing speeds...

      • by BUL2294 (1081735) on Monday July 07, 2008 @06:06PM (#24090435)

        Oh, don't worry about that. Games will just be more interesting. For example, that 3D monster you're trying to hack to death with a chainsaw will now suddenly shift to a different part of the screen... Or maybe you'll get a cool color-cycling effect from some incorrectly calculated values...

        "Intel Graphics Inside--it's all in good fun!"

      • Re:Pentium 75? (Score:4, Insightful)

        by ArcherB (796902) on Monday July 07, 2008 @09:11PM (#24092829) Journal

        Making math errors at blazing speeds...

        To err is human.

        To really screw up, you need the aid of a computer.

  • by vondo (303621) on Monday July 07, 2008 @05:33PM (#24089953)

    A little context might help. This isn't the Inquirer for god's sake.

    • by Darkness404 (1287218) on Monday July 07, 2008 @05:35PM (#24089985)
      Larrabee is the codename for a discrete graphics processing unit (GPU) chip that Intel is developing as a revolutionary successor to its current line of graphics accelerators. The video card containing Larrabee is expected to compete with the GeForce and Radeon lines of video cards from NVIDIA and AMD/ATI respectively. More than just a graphics chip, Intel is also positioning Larrabee for the GPGPU and high-performance computing markets, where NVIDIA and AMD are currently releasing products (NVIDIA Tesla, AMD FireStream) which threaten to displace Intel's CPUs for some tasks. Intel plans to have engineering samples of Larrabee ready by the end of 2008, with public release in late 2009 or 2010.[1]

      According to Wikipedia http://en.wikipedia.org/wiki/Larrabee_(GPU) [wikipedia.org]
      • by TransEurope (889206) <eniac@uni-koblenz . d e> on Monday July 07, 2008 @05:58PM (#24090337)
        Interesting is also that intel expects a maximum power consumption of at least 300 Watts. I personally expect nothing from that thing. The ancient technology of the cores and the perspective of building a system serving and cooling a hotspot of 300 Watts doesn't make these cards my favourite choice yet. I#m very sceptic about Intes try of making a high end graphic board. I really can't imagine that old cores of first gen Pentiums will be able to compete with modern stream processing units. I'm wondering that Intel wasn't able to choose some RISC-design at least, maybe i960.
        • by lorenzo.boccaccia (1263310) on Monday July 07, 2008 @06:16PM (#24090561)
          also considering that at least the three last attempt of intel of building a high end graphic board failed miserably, and are now almost a recurring joke.

          sorry for the drunken english
        • by Joe Snipe (224958) on Monday July 07, 2008 @07:49PM (#24091931) Homepage Journal

          I#m very sceptic about Intes

          Cool, proof of Dvorak keyboard use in the wild

        • Re: (Score:3, Interesting)

          by JebusIsLord (566856)

          What I'm confused about: Around 40% I believe of the original Pentium was x86 translation layer.. it was the first chip to use a RISC-like internal setup. Nowadays that percentage is way lower since the rest of the chip has gotten all the new transistors. Is this chip going to have 32 x86 translation units?

          • Re: (Score:3, Informative)

            by TheRaven64 (641858)
            The percentage will be a lot lower if each chip has a 512-bit vector unit - that alone will likely double the core size. The P54C used a very simplistic branch predictor, which makes sense for graphics applications where branches are relatively uncommon and the miss-prediction penalty is much lower (it's insane on modern chips - I got a 25% speedup [on AMD, onlt around 15% on Intel] on some code the other day just by removing a couple of if statements that were almost always taken). Since it's intended as
          • Re: (Score:3, Informative)

            by imroy (755)

            Around 40% I believe of the original Pentium was x86 translation layer.. it was the first chip to use a RISC-like internal setup.

            No it wasn't. The later Pentium Pro [wikipedia.org] was the first Intel processor to use this method. The Nexgen Nx586 was the first ever (for x86 at least). AMD bought Nexgen and used them to create the K5 (launched slightly after the PPro).

    • by jandrese (485) <kensama@vt.edu> on Monday July 07, 2008 @05:37PM (#24090005) Homepage Journal
      According to TFA, it's a graphics card that Intel is making to compete with Intel and ATI. I'm guessing it's going to be highly optimized for Ray Tracing given Intel's statements in the past. Total power consumption estimates are jaw dropping, TFA estimates around 300W.
      • by jandrese (485)
        Obviously they're competing with nVidia and ATI, not Intel and ATI. Geez, even mandatory previews don't always work.
      • by poetmatt (793785) on Monday July 07, 2008 @05:59PM (#24090349) Journal

        Not only is the power retarded, but ATI already can do 100% native ray tracing [techpowerup.com] which crushed intel bigtime.

        I welcome intel trying to push for marketshare but it's going to be many generations before intel can play catchup on graphics cards...specifically when we get around to 32+GB of ram and you can afford a couple gigs for graphics (at which point we'll need 4+ gigs for graphics probably), the performance of an integrated solution will still be lacking. Graphics bandwidth and needs increases far exponentially beyond that of processing needs for anything graphics intensive by definition (currently).

        • Re: (Score:3, Informative)

          by trigeek (662294)
          Where in the article did it say that Larrabee was an integrated solution? Did you not see the picture of the card in the article?
      • by Joce640k (829181) on Monday July 07, 2008 @06:01PM (#24090369) Homepage

        Not quite...

        Larrabee is a general purpose number cruncher with high degree of parallelism.

        NVIDIA/ATI are moving towards making their graphics cards capable of running general purpose code. Intel is coming from the other side, moving a general purpose parallel-compute engine towards doing graphics.

        Yes it's a subtle difference and yes they'll meet in the middle, it's just a question of angles.

        Intel wants the parallel compute market more than it wants the graphics card market so that's who it's pitching this at.

    • by clampolo (1159617)

      A little context might help. This isn't the Inquirer for god's sake.

      It's Intel's graphics chip for competing with nvidia. They are moving into this turf because nvidia is attempting to use their CUDA technology to make the CPU less important.

      So it's only natural that Intel is fighting back.

    • by KlomDark (6370) on Monday July 07, 2008 @05:42PM (#24090095) Homepage Journal

      It's one of the larger cities in Wyoming. Get with it. ;)

    • Manycore GPU (Score:5, Interesting)

      by DrYak (748999) on Monday July 07, 2008 @05:54PM (#24090293) Homepage

      Larrabee [wikipedia.org] is going to be Intel's next creation in the GPU world. A many core GPU which has the following peculiarities :

      - fully compatible with x86 instruction set. (whereas other GPU use different architecture, and often instruction sets that aren't as much adapted to run general computing).
      Thus, the Larrabee could *also* be used as a many core main processor (if popped into a quick path socket) and used to execute a good multicore OS. Something that's not achievable with any current GPU (both ATI's and nVidia's completely lack some control structures - both are unable to use subroutines and everything must be in-lined at compile time)

      - unlike most current Intel x86 CPUs, features a shallow pipeline, executing instruction in-order. Hence, the Larrabee (and the Silverthorne which also have such characteristics) are regularly compared with old Pentiums (which also share those characteristics) since the initial announcement and including in TFA.

      - feature more cores with narrower SIMD : 32 cores able each to handle 16 32bit float simultaneously. Whereas, for exemple nVidia's CUDA-compatible GPU have up to 16 cores only, but each able to execute 32 threads over 4 cycles and keep up to 768 threads in flight.
      This enable Larrabee to cope with slightly more divergent code than traditional GPUs and make it a good candidate to run stuf like GPU accelerated RayTracing.

      Hence all the recent technical demos running Quake 4 in raytracing mentionned on /.

      That's for what Intel tells you.

      Now the old and experienced geek will also notice that Intel has only kept making press releases and technical demo running on plain regular multi-chip multi-core Intel Cores (just promising that the real chip will be even better than the demoed stuff).

      Meanwhile, ATI and nVidia are churning new "half"-generations each 6 months.

      And the whole Larrabee is starting to sound like a big vaporware.
       

      • - fully compatible with x86 instruction set. (whereas other GPU use different architecture, and often instruction sets that aren't as much adapted to run general computing).

        I was about to ask "Since when is the x86 instruction set optimized to run general computing?"

        Then I noticed that the word was "adapted". Yeah, that's fair...

        Seriously: The x86 (inspired by the hardware driving Datapoint's early smart terminals and previous chips for building hand calculators) was contemporary with Motorola's 68x (insp

        • On which scale.... (Score:4, Informative)

          by DrYak (748999) on Tuesday July 08, 2008 @06:21AM (#24097569) Homepage

          It's mainly a question of "on which scale are we comparing chips".

          Yes, x86 instruction set is utterly ugly and horribly contrived, compared to nice contemporary architectures like 68k. Computing would probably be filled with less hoops had IBM decided to go with Motorolas for their PCs (as lot of other home computers or arcade and home console have done).

          *BUT*

          if we place GPUs on the same scale, suddenly the x86 shines : it doesn't completely suck at branching, and has an actual stack that can be used to call sub procedures, has interrupts, etc.
          It is an architecture able to run an OS.
          nVidia CUDA machine on the other hand, mainly use SIMD-masking for most conditional operation, aren't really brilliant when it comes to branching, and completely lack any way to do sub-procedures. Those chips have loads of register. But instead of using them to do register windows and do RISC-style sub calls, they use the registers to keep more thread in flight.
          It definitely make a lot of sense from a functional point of view (those are GPUs, they are made to processing fuck-loads of pixels per seconds), but this makes them unable to run linux.

          On that scale, having x86 on a GPU suddenly makes it a lot interesting for usages outside the usual "draw triangles very fast". Even if x86 sucks to begin with.

          And for the record : there's hardly a way that the 68k architecture ever prevailed. It's a good one. But IBM was never seing its PC as anything better than a glorified terminal. For such kind of machine, there were of course going for the cheapest possible chip.
          Given a choice between a half assed chip from Intel with a 16bit extension quickly tackled over a design inherited from early 8bit chips (8008, 8080 and concurrent Zx80 - most assembler code can be directly recompiler on 8088 after a few register renaming) AND a very nice chip from Motorola redesigned from the ground up to be a nice and clean 16/32 bits architecture designed for future expension :
          Of course they will pick the Intel. It's cheaper and there's no need for a future proof 32bits processor in a fucking "Terminal Deluxe".

          And of course, because of the (relatively) low cost, because of the (very strong) brand recognition, because of the (somewhat) openness of the platform enabling clones (in the sense it was documented. Of course, Phoenix had to completely rewrite the BIOS because of copyright restrictions - but IBM considered Big Irons being they main products and didn't mind such clones), and because they were takin a relatively uncrowded market (most home computers were for homes, school, and small shops - PC were marketed for corporations) :
          The PC was bound to take over the market very quickly - *with* its bad design (almost *because* of it). And was bound to set the standard, as bad this standard is.
          And by then, it was too late for IBM to take a better architecture to produce a "Terminal Deluxe Pro Mark-III" with a clean 68k chip.

          Of course, had the PC had a less crippled OS, designed to be slightly more extensible and making less assumption about the architecture than MS-DOS (you know the "we laid everything around 1MiB and though it would last for at least 10 years" by mr. Gates), perhaps a switch to a better different architecture could have been less painful, and a cleaner architecture could have blessed the PC world sooner.

  • by Anonymous Coward

    Sounds great, as long as you don't plan on doing any floating point math [wikipedia.org] on it!

    • by h4rm0ny (722443) on Monday July 07, 2008 @06:02PM (#24090391) Journal

      Hey, only Intel provide you with a floating point that really floats - why you never know where it's going to end up! Now that's floating!:D
    • Re: (Score:2, Funny)

      by Anonymous Coward

      Intel, Intel, give me your answer do,
      Going hazy, can't divide three by two.
      My answers I can't see 'em,
      They're stuck in my Pent-i-um,
      So you'd look great
      If you would make
      A functional FPU.

      (best sung by mid-'90s speech synthesisers)

  • "Stone knives and bearskins"

  • Pentiums? (Score:4, Funny)

    by h4rm0ny (722443) on Monday July 07, 2008 @05:37PM (#24090003) Journal

    This is just unbelievably good news. After all this time, I get to start telling Pentium jokes again! I never thought I would!
    • by Anonymous Coward on Monday July 07, 2008 @05:43PM (#24090117)

      Intel... where quality is job 0.9995675!

    • Re: (Score:3, Funny)

      by Red Flayer (890720)

      This is just unbelievably good news. After all this time, I get to start telling Pentium jokes again! I never thought I would!

      This is slashdot. You didn't need something like this to beat the Pentium dead horse... or for that matter, any dead horse.

      In other words,

      In Soviet Russia, floating-point arithmetic messes up Pentium

      Netcraft confirms, Pentium is undead. Brainssss!

      Imagine a Beowulf cluster of these.

      Et cetera, ad infinitum.

    • Re: (Score:3, Funny)

      by CAIMLAS (41445)

      It's all about the Pentiums, baby.

  • by Joce640k (829181) on Monday July 07, 2008 @05:37PM (#24090011) Homepage

    Get your acronyms right....

  • by Gat0r30y (957941) on Monday July 07, 2008 @05:40PM (#24090065) Homepage Journal

    The card features one 150W power connector, as well as a 75W connector. Heise deduces that this results in a total power consumption of 300W,

    Um, that just doesn't seem to quite add up to me.

    • Re:I'm no expert but (Score:5, Informative)

      by tlhIngan (30335) <<ten.frow> <ta> <todhsals>> on Monday July 07, 2008 @05:44PM (#24090129)

      The card features one 150W power connector, as well as a 75W connector. Heise deduces that this results in a total power consumption of 300W,

      Um, that just doesn't seem to quite add up to me.

      Power can come from multiple sources. In this case, you have a 150W power connector (probably a 6pin PCIe one), and another 75W one (yet another 6pin PCIe). The remaining 75W comes from the PCIe connector itself.

      Nothing terribly unusual - a number of cards are coming out in configurations like this, and 300W for a video card is starting to become the norm, depressing as it is.

      • by Gat0r30y (957941)
        Thanks for clarifying, and you are right, 300W is out of control for a graphics card. On the upside, maybe I won't game so much anymore because of the electricity bill.
      • by Yvan256 (722131)

        My Core 2 Duo Mac mini + ViewSonic VP171s are both listed at 30-35W average.

        Hearing about videocards requiring power connectors AND wasting 300W of power just seems insane to me.

        Not to mention the power for the CPU, RAM, hard drives, LCD, etc. And since all of this crap generates heat, some of you are also paying double/triple since you run the AC to counter the heat.

      • Re: (Score:3, Informative)

        ...and 300W for a video card is starting to become the norm, depressing as it is.

        Not really, die shrinks have been actually driving down power consumption. If you look at this page: http://www.guru3d.com/article/radeon-hd-4850-and--4870-crossfirex-performance/3 [guru3d.com] you can see that the latest generation Radeon 4850 and 4870 consume much less power than the power hungry peaks set by the 2900XT. The 4850 system uses less than 300W at full load. That's pretty damn impressive considering the ridiculous amount of performance it puts out.

      • Re: (Score:3, Insightful)

        by jandrese (485)
        The good news is that video card manufacturers have heard the plea and are trying to reduce the power consumption on their newer cards. nVidia's newest GTX series cards draw less power when idle than pretty much anything they've made outside of their Mobile line in years, although they are voracious when running full tilt. As long as you spend most of your time not gaming (which is true of most people) they won't inflate your power bill nearly as much as their maximum power draw might suggest.
    • by h4rm0ny (722443) on Monday July 07, 2008 @05:56PM (#24090319) Journal

      Um, that just doesn't seem to quite add up to me.

      It does if you work it out on a Pentium I [wikipedia.org] :D

    • Re: (Score:3, Funny)

      by Chyeld (713439)

      The card features one 150W power connector, as well as a 75W connector. Heise deduces that this results in a total power consumption of 300W

      Um, that just doesn't seem to quite add up to me.

      Seeing as it's based on a cluster of Pentiums, did you really expect it to add up?

  • It really is all about the Pentiums.

  • good. (Score:5, Insightful)

    by apodyopsis (1048476) on Monday July 07, 2008 @05:45PM (#24090153)
    good. sounds like a sensible engineering decision.

    on the basis that..
    the design is well known, understood and has had rigorous testing in the field
    they will no doubt fix any understood errors firstlimits the RnD to the multicore section

    as long as the chip performs well for the silicon overhead then they should feel free to cram as many in as they want.

    seems perfectly sensible to me.
  • by Anonymous Coward

    Core 1: 4195835/3145727 = 1.33382
    Core 2: 4195835/3145727 = 1.33382
    Core 3: 4195835/3145727 = 1.33382
    Core 4: 4195835/3145727 = 1.33382
    .
    .
    .
    Core 31: 4195835/3145727 = 1.33382
    Core 32: 4195835/3145727 = mmm... 1.33374? Oh, f*ck!

  • I doubt it (Score:5, Interesting)

    by Bender_ (179208) on Monday July 07, 2008 @05:49PM (#24090219) Journal

    I doubt it. Maybe they mentioned the Pentium as an example to explain an in-order superscalar architecture as opposed to more modern CPUS.

    -There is a lot of overheard in the P54C to execute complex CISC operations that are completely useless for graphic acceleration.

    -The P54C was manufactured in a 0.6micron BiCMOS process. Shrinking this to 0.045micron CMOS (more than 100x smaller!) would require a serious redesign up to the RTL level. Circuit design had evolve with process technology.

    -a lot more...

    • ...and the Pentium III was basically the same as the Pentium Pro.

      If Intel is going backwards then why not go all the way back to the original Pentium? Makes sense to me.

    • It's more likely that they are taking basic design concepts. It says 'based on' not 'clone of'. By optimizing some of the overhead you mention with more modern architectural technicques than can both keep it simple and capitalize on modern optimizations.

    • Re:I doubt it (Score:4, Interesting)

      by Enleth (947766) <enleth@enleth.com> on Monday July 07, 2008 @06:12PM (#24090493) Homepage

      It's unlikely but not impossible - don't forget that the Pentium M and, subsequently, Core line of processors was based on Pentium III Coppermine, whereas the Pentium 4 Netburst architecture developed in the meantime was abandoned completely. Going back to Pentium I would be a bit on the extreme, but it's possible that they meant some basic design principles of Pentium I, not the whole core as it was. Maybe they will make something from scratch, but keep it similar to the original Pentium's inner RISC core, or maybe redo it as a vector processor or hell knows what. It was a citation from a translated interview with some press monkey, so you can expect anything.

      • Re:I doubt it (Score:4, Informative)

        by TheRaven64 (641858) on Tuesday July 08, 2008 @04:44AM (#24096933) Journal

        don't forget that the Pentium M and, subsequently, Core line of processors was based on Pentium III Coppermine, whereas the Pentium 4 Netburst architecture developed in the meantime was abandoned completely

        This keeps being repeated, but is simply not true. The Core 2 is a completely new microarchitecture, and so doesn't count in this discussion, while the Core 1 is essentially almost identical to the Pentium M. The Pentium M, however, is not just a tweaked P3 with Netburst completely abandoned. It has a slightly longer pipeline than the P3, and it takes several important features from the Netburst architecture, including (but not limited to) the floating point and vector pipelines and the branch predictor. The Pentium M took the best parts from the P3 and P4 architectures - it didn't just throw one away.

    • It's only 13x smaller. :)

    • Re:I doubt it (Score:4, Interesting)

      by Chip Eater (1285212) on Monday July 07, 2008 @06:17PM (#24090567)
      A process shrink, even a deep one like .6 um to 45 nm shouldn't require too many RTL changes if the design was done right. But I don't think they are using "soft" or RTL cores. Most likely this P54C was a custom design. Shrinking a custom design is a lot more tedious. Which might help explain why they chose such a old, small core.
    • by mbessey (304651) on Monday July 07, 2008 @06:23PM (#24090709) Homepage Journal

      Obviously they're not just going to slap a bunch of Pentium cores on there and call it good. But the high-level design can probably start off with the P54, and just rip out stuff that doesn't need to be supported, possibly including:

      Scalar floating-point, 16-bit protected mode, real mode, operand size overrides, segment registers, the whole v86 mode, the i/o address space, BCD arithmetic, virtual memory, interrupts, #LOCK, etc, etc.

      Once you've done that, you'll have a much simpler model to synthesize down to an implementation. And with a slightly-modified compiler spec, you can crank out code for it with existing compilers, like ICC and GCC.

      • Re: (Score:3, Funny)

        by TheRaven64 (641858)
        You might want to keep the lock prefix for this kind of application. And the P54C didn't have BCD arithmetic - it had BCD load and store operations which translated to binary floats internally. You got the precision of binary floating point arithmetic and the storage density of BCD. Something only an Intel engineer could invent.
    • Re:I doubt it (Score:4, Interesting)

      by georgewilliamherbert (211790) on Monday July 07, 2008 @06:24PM (#24090739)

      One does not "shrink" a chip by taking photomasks and shrinkenating. One redoes the design / layout process, generally. The P5 series went from 0.8 um to 0.25 um over its lifetime (through Tillamook), stepping through 0.6, 0.35, and finally 0.25 um.

      It was 148 mm^2 at 0.6 um, so the process shrink should bring it down to a floorplan of around a square millimeter or so a core. Not sure how big the die will be for Larrabee, but the extra space will probably support the simple wide data unit per core and more cache. If the SIMD is simple it could be another 3-4 million transistors / 1 square mm or so. For a 100 mm^2 chip that gives you another 30 mm^2 or so for I/O and cache (either shared, or parceled out to the cores).

      • by DragonHawk (21256) on Monday July 07, 2008 @06:46PM (#24091069) Homepage Journal

        One does not "shrink" a chip by taking photomasks and shrinkenating.

        'course not. You use a transmogrifier. In the industry, it is known as the "Bill Watterson" process.

        It can also be used to turn photomasks into elephants, which, while less profitable, is immensely entertaining if the operator didn't see you change the setting.

    • You do realize they have automated tools to take Verilog source (or whatever they use) and throw it on to silicon. Sure, it probably won't run at the clock frequency that you would get with hand-tuned circuits, but it'll work.
    • Re:I doubt it (Score:5, Informative)

      by ratboy666 (104074) <fred_weigel@hotmail . c om> on Monday July 07, 2008 @06:50PM (#24091153) Homepage Journal

      The original Pentium (which went to 166Mhz, at the end, not just 75Mhz), used U and V execution pipes. No translation to micro-ops, and no "out of order". Indeed, there shouldn't be a need for that in Larrabee, anyway, given the number of cores. It would almost be better to get rid of the V pipe, and add SIMD, instead.

      Your comments on CISC are bit off-base; the idea is to execute shaders in x86 machine code. They can be simple (limited flow control), or complex (general CPU/GPU).

      "out-of-order" (ei. Pentium Pro and better) is not so good with that many cores doing that kind of work. It would get the hardware into a lot of trouble. Better to keep it simple, and add more cores.

      A better start point would probably have been ARM, but that would lose the compatibility edge. If Larrabee works, it will take the GP-GPU market by storm. It needs:

      1 - to publish itself as an NUMA access CPU (add a bit to tell the OS what it is for)
      2 - compiler optimizations for the particular CPU architecture, preferably broken into two pieces:
      2a - "straight line" shader code
      2b - branching code
      3 - a guide to the new NUMA characteristics.

      With that in place, a standard (BSD/LINUX) OS will be able to use it for regular jobs. Or, for those special "I need the SIMD unit" jobs. The biggest hassle is trying to split control of those new CPU units between OpenGL and the regular scheduler (this is a kernel hack that Intel will have to make). It would be easier to jam this into OpenSolaris, but that isn't anywhere near popular enough.

      Don't you want your video card to assist compiling large source when not gaming/modeling? Why not?

      And, a few "extra" points

      - Intel already has an optimizing compiler for the P54C architecture, and we have gcc.
      - The architecture, including U/V pipelines only used 3.1 million transistors.
      - A GeForce 7800 GTX has 302 million transistors -- 100x the number of the original Pentium processor.

      So, I would think that using 32 "Pentium Classic" cores reduced would be quite feasible -- you need some (lots) of logic to ensure that they can all access their respective memories. The general SIMD implementation will take quite a bit of real estate as well. There is probably a budget of 600M transistors (wild ass guess) to Larrabee, estimate derived from power consumption estimates.

      The gate size shrink should result in higher speeds. There may be a danger in the complex instruction interpretation routines, but these can be corrected. The single cycle instructions are already a (more than less) synchronous design, and should scale trivially.

      Anything I am missing?

      I, for one, am looking forward to buying a desktop super-computer with Larrabee.

      • Re: (Score:3, Insightful)

        by Anonymous Coward

        Intel's basically doing here what Sun did with the Niagara series, but without concurrent threading. I suppose it wouldn't be too tough to add it in, though. The cores in the Niagara are really simple 6 or 7 stage pipelines. They don't do any forwarding, and stall at pretty much every hazard they hit. Instead of adding all the complicated circuitry needed for do advanced pipeline stuff (like forwarding and OoO etc), they just defer execution to a new thread. All the threading is in the cores themselves

  • by Marko DeBeeste (761376) on Monday July 07, 2008 @05:56PM (#24090321)
    Larrabee is the Chief's cousin
  • by Antony T Curtis (89990) on Monday July 07, 2008 @06:18PM (#24090593) Homepage Journal

    If anyone remembers those old original Pentiums, their 16-bit processing sucked - so much that a similarly clocked 486 could outperform them. I guess that it would be reasonably trivial for Intel to slice off the 16bit microcode on this old chip to make a 'pure' 32-bit only processor. I am sure that they will be using the designs with a working FPU... but for many visual operations, occasional maths errors would largely go unnoticed. Remember when some graphics chip vendors were cheating on benchmarks by reducing the quality ... and how long it took for people to notice?

    Although, if I had Intel's resources and was designing a 32-core cpu, I would probably choose the core from the latter 486 chips... I don't think a graphics pipeline processor would benefit much from the Pentium's dual instruction pipelines and I doubt that it would be worth the silicon realestate. The 486 has all the same important instructions useful for multi-core work - the CMPXCHG instruction debuted on the 486.

    • by Pinback (80041)

      Yup, its confirmed. We're getting 32 i960 cores in one chip. Dust off those floating-point-on-integer libraries.

      That isn't a graphics card, its 32 laserjet brains on one card.

  • Marketing Math (Score:3, Insightful)

    by fpgaprogrammer (1086859) on Monday July 07, 2008 @06:19PM (#24090617) Homepage
    From TFA "Heise also claims that the cores will feature a 512-bit wide SIMD (single input, multiple data) vector processing unit. The site calculates that 32 such cores at 2GHz could make for a massive total of 2TFLOPS of processing power."

    I don't see how they get to 2 TFLops.

    512-bit = 64 bit * 8 way SIMD or 32 bit * 16 way SIMD. Let's go with the bigger of these two and say we are performing 16 single Floating point operations per clock-cycle per core. 16 operations per clock-core * 32 cores * 2 Billion clocks per second = 1024 Single Precision GFlops. It looks more like 512 Double Precision GFlops for 300 Watts which means a DP Teraflop on Larabee will cost you 513 Dollars a Year [google.com] at 10 cents/kWH. If we're considering single precision, we can cut this in half to 257 dollars per years per single precision teraflop.

    Compare to Clearspeed which offers 66 DP GFLops at 25 Watts costing 332 dollars [google.com] for a sustained DP teraflop for a year.

    even the NVidia Tesla has better performance at single precision: you can buy 4 SP TFlops consuming only 700W or 5.7 GFLops/Watt, for an annual power budget of 153 dollars [google.com].
    • Re: (Score:2, Insightful)

      by David Greene (463)

      I don't see how they get to 2 TFLops. 512-bit = 64 bit * 8 way SIMD or 32 bit * 16 way SIMD. Let's go with the bigger of these two and say we are performing 16 single Floating point operations per clock-cycle per core. 16 operations per clock-core * 32 cores * 2 Billion clocks per second = 1024 Single Precision GFlops.

      Most likely there is a muladd unit, which would double the peak FLOPS.

  • First Core Tech was based off pre Netburst Architecture and now this. In 5 years intel will announce a 4096 Core 80386 for sound your sound card or something. ;P
    • That's because NetBurst was architecturally inferior to even the original P5 Pentium. If it were possible to overclock a 486 to 3+ GHz, it would perform about the same as a NetBurst chip.

      The older technology was better in every way.

  • by hattig (47930) on Monday July 07, 2008 @06:49PM (#24091137) Journal

    Right. It clearly isn't using the Pentium design, but a Pentium-like design.

    To that, they will have added SMT, because (a) in-order designs adapt to SMT well because they have a lot of pipeline bubbles and (b) there will be a lot of latency in the memory system and SMT helps hide that. I would assume 4 way SMT, but maybe 8. Larrabee will therefore support 128 or 256 hardware threads. nVidia's GT280 supports 768.

    The closest chip I can think of right now is Sun's Niagara and Niagara 2 processors, except with a really beefy SIMD unit on each core, and a large number of cores on the die because of 45nm. I think Niagara 3 is going to be a 16 core device with 8 threads/core, can anyone confirm?

    Note that this is pretty much what Sony wanted with Cell, but Cell was 2 process shrinks too early. 45nm PowerXCell32 will have 32 SPUs and 2 PPUs (whereas Larrabee looks like it is matching an equivalent of a weak-PPU with each SPU equivalent). It could run at 5GHz too... power/cooling notwithstanding.

    • Re: (Score:3, Informative)

      by adisakp (705706)
      Niagara has direct access to memory AFAIK.

      The big architectural difference with the CELL SPU's is that SPU's really are not meant to directly access system memory. Each SPU has a very limited local memory buffer it can directly access. System memory can be modelled as a RAM DISK and accesses to system memory are through a DMA that can be considered the equivalent to an asynchronous file read/write using the RAM DISK analogy.
  • by greywire (78262) on Monday July 07, 2008 @06:57PM (#24091247) Homepage

    at least 20 years ago, I thought, hey, with the density and speed of transistors these days, and with RISC being popular, why not go all the way and make chip with literally hundreds of (wait for it..) Z80 cpu's?

    Of course I and others dismissed the idea as being just slightly ludicrous. But then, at the time, I also thought eventually there would be Amiga emulators and interpreted versions of C language, for which I was also called crazy to think...

  • Why not 486 cores? Then you could put 4X as many of them on your die. They already include integral FP and 1 op/cycle for most instructions.
  • ha! anyone remember the f00f bug [wikipedia.org]?

    I learned how to embed machine code into C and ran amok halting university systems with that for a little while.

    Or about that floating point bug [wikipedia.org]?

  • by jharel (1201307) on Monday July 07, 2008 @07:48PM (#24091923)
    Hmm... Let's see where they got this from. They claim they got it from a Babelfish translation of Heise, a German site (Yeah, start wincing now...)

    http://babelfish.yahoo.com/translate_url?doit=done&tt=url&intl=1&fr=bf-home&trurl=http%3A%2F%2Fwww.heise.de%2Fct%2F08%2F15%2F022%2F&lp=de_en&btnTrUrl=Translate [yahoo.com]

    Actually, they got the "Gelsinger said so" remark from Expreview, itself a Chinese site:

    http://en.expreview.com/2008/07/07/larrabee-unleashes-2-tflops-capacity [expreview.com] (note they curteously attached the Larrabee board diagram leaked from a while back):

    "Gelsinger said the Larrabee will be a 45nm product featuring SIMD technique, 64-bit address. Besides, 32 of cores runing at 2.00 GHz will unleash 2 TFLOPS capacity, twice as much as the RV770XT."

    But did Gelsinger really SAID those things?

    Here is the Google translation of the same Heise article: http://translate.google.com/translate?u=http%3A%2F%2Fwww.heise.de%2Fct%2F08%2F15%2F022%2F&hl=en&ie=UTF8&sl=de&tl=en [google.com]

    It seems that no matter which crappily translated version of the German article one looks at, it appears that Gelsinger said no such thing... The part about Larrabee containing P54C cores was clearly in a separate paragraph, written after a speculative question.

    So I guess Expreview THOUGHT Pat said something after it took a too-short of a look at the Heise article, after which CustomPC sensationalized the whole thing, not really bothering to actually read even the translated link it posted. Now, some random Slashdotter is doing the same curtesy.

    There you go, folks- Internet reporting.

Money is the root of all evil, and man needs roots.

Working...