Slashdot is powered by your submissions, so send in your scoop


Forgot your password?
Intel Supercomputing Hardware

Intel Squeezes 1.8 TFlops Out of One Processor 168

Jagdeep Poonian writes "It appears as though Intel has been able to squeeze 1.8 TFlops out of one processor and with a power consumption of 62 watts." The AP version of the story is mostly the same; a more technical examination of TeraScale is also available.
This discussion has been archived. No new comments can be posted.

Intel Squeezes 1.8 TFlops Out of One Processor

Comments Filter:
  • Oblig. (Score:2, Funny)

    by Anonymous Coward
    Imagine a Beowolf cluster of those!!
    • Re:Oblig. (Score:5, Interesting)

      by niconorsk ( 787297 ) on Monday February 12, 2007 @10:28AM (#17982300)
      It's quite fun to consider that when the original joke was made, the processing power of that Beowulf cluster would probably been quite close to the processing power of the processor discussed in the article.
    • They did RTFA:

      "However, considering the fact that just 202 of these 80-core processors could replicate the floating point performance of today's highest performing supercomputer, those power consumption numbers appear even more convincing: The Department of Energy's BlueGene/L system, rated at a peak performance of 367 TFlops, houses 65,536 dual core processors."
      • Re: (Score:2, Interesting)

        by Anonymous Coward
        It is entirely not true that you could replace today's fastest computer with this kind of technology and get the same performance. These new Intel CPU's are really difficult to program efficiently. You would only get good performance on certain problems sets.
        • Re: (Score:3, Interesting)

          by PitaBred ( 632671 )
          Because it doesn't take special problem sets and programming on the current supercomputers?
        • by adam31 ( 817930 )
          Certain problems like transforming vertices and shading pixels, where programming efficiently is easily achieved with HLSL.

          These aren't meant for supercomputers. Those aren't DP flops they are talking about, and it doesn't seem like it is Intel's intent to change that. Of course, there are already GPUs doing SP teraflop. Sony bragged that RSX in PS3 was 1.8 TFlop, and newer GPUs are even faster. But hotter, probably.

  • by tomstdenis ( 446163 ) <> on Monday February 12, 2007 @10:21AM (#17982226) Homepage
    The trick like SPEs is finding way to efficiently use them in as many tasks as they can.

    I'm glad to see Intel is using their size for more than x86 core production though.

    • I read an article in the morning paper (probably AP) where they said it might not make it out of the development stage. As I understand what they have done is add high-k to the gate stack - greatly reducing power consumption. So - it might still be x86 architechure but it will run a lot for very little power - at standard (3 GHz) frequencies.
    • 99% is exagerated (Score:4, Interesting)

      by Anonymous Coward on Monday February 12, 2007 @10:45AM (#17982496)
      The first thing that jumped out at me was the presence of MACs. They are the heart of any DSP. So, this chip is good for computation although not necessarily processing. As other posters have pointed out, this chip could become a very cool GPU. It should also be awesome for encryption and compression. Given that the processor is already an array, it should be a natural for spreadsheets and math programs such as Matlab and Scilab. Having a chip like this in my computer just might obviate the need for a Beowolf cluster. :-)
    • So finally the tile processor architecture makes it to the industry. People in the comp arch group at MIT envisioned and prototyped something pretty similar to this years ago as the RAW processor. [] [] er=13382&arnumber=612254 []
  • by xoyoboxoyobo ( 945657 ) on Monday February 12, 2007 @10:28AM (#17982302)
    That's not 62 watts at 1.8 teraflops. That's 62 watts at 3.16 GHz FTFA: "Intel claims that it can scale the voltage and clock speed of the processor to gain even more floating point performance. For example, at 5.1 GHz, the chip reaches 1.63 TFlops (2.61 Tb/s) and at 5.7 GHz the processor hits 1.81 TFlops (2.91 Tb/s). However, power consumption rises quickly as well: Intel measured 175 watts at 5.1 GHz and 265 watts at 5.7 GHz. However, considering the fact that just 202 of these 80-core processors could replicate the floating point performance of today's highest performing supercomputer, those power consumption numbers appear even more convincing: The Department of Energy's BlueGene/L system, rated at a peak performance of 367 TFlops, houses 65,536 dual core processors."
    • by StressGuy ( 472374 ) on Monday February 12, 2007 @10:45AM (#17982502)
      Get the bugs worked out be Xmas and you could sell at 1.81 Tflop easy-bake oven

      {...I need more sleep...}
    • Furthermore, I think it's kind of weird to say that it's "one processor". It may be one chip, but is a processor defined by its die? Since it's an 80-core chip, isn't it more accurate to say that it's 80 CPUs on one die, just as a dual-core chip is rather two CPUs on one die? It's not as if it isn't impressive, but I think it's kind of misleading to say that it's just one processor.
      • I wonder the same whenever some marketing genius mentions a dual-core processor. Of course, processors didn't have cores until Intel innovated the Core architecture ;)
        • by afidel ( 530433 )
          Uh, processor's had core's WAY before Intel came out with the Core architecture. In fact you could buy a MIPS, ARM, etc core for your system-on-chip design as far back as the early 90's that I'm familiar with, and probably further back than that. Just because something has been recycled by marketing doesn't mean it didn't start out in the technical realm =)
  • Just imagine (Score:2, Insightful)

    by andyck ( 924707 )
    "Intel" "Introducing the NEW CORE 80, personal laptop supercomputer running Windows waste my ram and cpu cycles SP2 edition" But seriously this looks interesting for the future. Now we just need software to fully utilize multicore processors.
    • by TheUni ( 1007895 )
      Core 80? Psh. I'm waiting for Core2 80...

      Though i'm tempted to wait for Core-Quad 80 extreme.... 320 cores!
  • by DoofusOfDeath ( 636671 ) on Monday February 12, 2007 @10:42AM (#17982462)
    Does this permit the practical use of any truly breakthrough apps?

    Does it suddenly make previously crappy technologies worthwhile? I.e., does image recognition or untrained speech recognition become a mainstream technology with this new processing power?
    • by truthsearch ( 249536 ) on Monday February 12, 2007 @10:50AM (#17982554) Homepage Journal
      Does it suddenly make previously crappy technologies worthwhile?


      (Sorry, couldn't resist.)
    • by Frumious Wombat ( 845680 ) on Monday February 12, 2007 @10:58AM (#17982650)
      Atomistic simulations of biomolecules. Chain a bunch of those together, and you begin to simulate systems on realistic time scales. Higher-resolution weather models, or faster and better processing of seismic data for exploration. Same reason that we perked up when the R8000 came out with its (for the time) aggressive FPU. 125 MFlops/proc@75MHz [] was nothing to sneeze at 15 years ago. If they can get this chip into production in usable quantities, and if it has the throughput, then they're on to something this time.

      Of course, this could just be a single-chip CM2 []; blazingly fast but almost impossible to program.
    • Re: (Score:3, Interesting)

      by Intron ( 870560 )
      Realtime, photorealistic animation and speech processing? Too bad AI software still sucks or this could pass a Turing test where you look at someone on a screen and don't know whether they are real or not.
      • I "seriously" doubt that this could be used to pass a turning test. The noise and heat from the fan sink keeping a 250+ watt processor cool would be a dead giveaway. If I recall correctly though I don't think you need a fancy avatar for the robot/computer/whatever to pass the turning test. It's more of a black box approach where all that matters is what the box says not how it says it.
        • by Intron ( 870560 ) on Monday February 12, 2007 @11:51AM (#17983368)
          Sorry, your post made me realize that a sophisticated processor is unnecessary. It's already difficult to tell whether a message is from a human or just a randomly generated string of nonsense.
    • Re: (Score:3, Insightful)

      by vertinox ( 846076 )
      Does this permit the practical use of any truly breakthrough apps?

      From my understanding perhaps with that many cores, the OS could simply allocate one application per core.

      But the OS has to support that feature or have applications that know how to call unused cores.

      From my understanding Parallels for OS X only uses one core and picks the second core to run on for the best performance.

      Of course then there are applications that could be programmed to use all the cores at once if they needed to do scientific
    • It will be interesting when the ability to merge and analyze multiple images becomes possible, even better if it can be done in real-time.

      "Vision" can give computers the ability to correct themselves. With visual feedback, suddenly robotic arms don't have to be told what to do via a long stream of coordinates, you could pretty much point.

      It could also enable a new form of GUI control where the camera just watches your hand--eliminating the need for a mouse.

      Pointing a single camera out the side window of a
  • I gotta get me one of these. This lends new creedence the Staples Red Button of major scientific and engineering problems. "That was easy!"
  • by rwyoder ( 759998 ) on Monday February 12, 2007 @10:45AM (#17982500)
    64 cores should be enough for anybody.
    • Build me a home computer that supports five to forty times the memory of all its competitors and then make fun of the PC.

      Seriously, when IBM and Microsoft released the IBM 5100 PC and MS-DOS/PC-DOS, the Apple II+ had 48k expandable to 64k, the Atari 600XL had 16k and the 800XL had 64k, Commodore hadn't yet released the Commodore 64 leaving them with the 5k VIC-20, and the Tandy Color Computer 1 had 32k. Most of these systems have 6800-series processors in the one megahertz range. The IBM had a processor whi
    • Once your software can take advantage of about 8 cores, it is probably scalable enough to take advantage of core increases almost as well as clock speed increases.
    • by julesh ( 229690 )
      Seriously, though, I have been wondering about this. With a design where each core connects only to its neighbours, surely a square array (i.e. either 64 or 128 cores) makes much more sense than the rectangular 8x10 array that this chip appears to be based on. Anyone?
  • by cciRRus ( 889392 ) on Monday February 12, 2007 @10:52AM (#17982598)
    Gonna get one of these. That should bump up my Vista Experience score.
  • by Dr. Spork ( 142693 ) on Monday February 12, 2007 @11:04AM (#17982728)
    When I read about this I didn't get all worked up, since I imagine that it will be almost impossible for realistic applications to keep all 80 cores busy and get the teraflop benefits. But then I read about the possibility of using this for real-time ray tracing, and got very intrigued!

    Ray tracing is embarassingly parallelizable, and while I'm no expert, two terraflops might just be enough calculating power to do a pretty good job at scene rendering, maybe even in real time. To think this performance would be available from a standard 65nm die that uses 65 watts... that really could make a difference to gamers!

    • Re: (Score:2, Interesting)

      by Vigile ( 99919 )
      Yep, that's one of the things that got me excited about it as well. Did you also read this article on ray tracing on the same site by a German guy that made a Quake 4 ray tracing engine? []
      • Re: (Score:3, Interesting)

        by Dr. Spork ( 142693 )
        I'd heard about the Quake3 thing somewhere else. It's pretty cool with Quake4. What really impressed me, though, is that when they multiplied the number of polygons in the scene by several orders of magnitude, rendering performance fell only 60% or so. This makes it seem like an increase in processing power will accomodate an expoential improvement in scene detail. This confirms my suspicion that real-time ray tracing is the future of game graphics.

        The fact that ray-traced Quake3 works OK in real time on

    • by tcas ( 691110 )
      I'm sorry, but this comment is really crazy:

      Firstly, there are hundreds of computation-intensive applications that can keep 80 cores busy: environmental modeling, protein folding... anything that currently uses a supercomputer.

      Secondly, why is the parallelizable nature of ray tracing embarrassing?! It's parallelizable exactly because each ray is computed independently of other rays - I don't see what is embarrassing or surprising about that.

      Finally, talking about the application to consumer gaming
      • by ispeters ( 621097 ) <ispeters&alumni,uwaterloo,ca> on Monday February 12, 2007 @11:45AM (#17983282)

        Secondly, why is the parallelizable nature of ray tracing embarrassing?! It's parallelizable exactly because each ray is computed independently of other rays - I don't see what is embarrassing or surprising about that.

        It's embarrassing because "Embarrassingly parallel" [] is the technical term for problems like ray tracing. It's a parallelizable problem wherein the concurrently-executing threads don't need to communicate with each other in order to complete their tasks so the performance of a parallel solution scales almost perfectly linearly with the number of processors that you throw at the problem.


      • by VE3MTM ( 635378 )
        "Embarrassingly parallel" is a term for such problems, where each step or component is independent and requires no communication. el []
      • by fitten ( 521191 )

        It's parallelizable exactly because each ray is computed independently of other rays - I don't see what is embarrassing or surprising about that.

        As others have said, "embarassingly parallel" isn't a derogatory term any more than "greedy algorithm" is.
        • Ooh look - three replies in a row - parallel!! - explaining the definition of a term related to parallel processing.

          Is something going to explode now?

    • I'd just like to point out, that yes, it would be great to do real-time raytracing with such powerful processors. Last week I was up until 6 in the morning waiting for a 2+ hour render of a reasonably simple scene to finish. Yeah, these procs would be great... if someone could just write a parallelizable version of POV-ray for Linux. Before someone jumps in to point to the few ports out there, let me head you off:

      A distributed version of POV-ray exists using the MPI library [], but it's based on the pretty
      • Tachyon [ []]

        Site may be down so look for mirrors.
      • Well, if I hadn't posted in this thread I would have modded you informative.

        So did I understand correctly that POV-ray at this point doesn't support parallel processing? If that's so, it would be a shame and it must really limit its usefulness in big projects.

        It would be cool if, just as the routines got more sophisticated, they'd get a consumer-grade processor that could run them in real-time.

  • I hope they can get them back in.
  • by doomy ( 7461 ) on Monday February 12, 2007 @11:08AM (#17982788) Homepage Journal
    33 of these CPU's should be more than enough to construct Lt. Cmdr Data [].
    • If I follow the wikilink, most of the "information" there seems to come from an epside from 1989 []. Apart from the sad thing that some people actually treat these data as real, the fun thing is that apparently the scriptwriters who made up these fictional data did a pretty good job to make up computer specifications that would still be out of reach for normal PCs 20 years later.
  • exaflop computers? (Score:3, Insightful)

    by peter303 ( 12292 ) on Monday February 12, 2007 @11:20AM (#17982956)
    Since petaflops are likely by the end of the decade its time to imagine exaflops in 2020.
  • by Joe The Dragon ( 967727 ) on Monday February 12, 2007 @11:29AM (#17983072)
    The FSB will be a big bottleneck even more so with the cpu needing to use to get to ram. You would need about 3-4 FSBs with 1-2 mb per core of L2 to make it fast.
    • What is the point of commenting on an article you haven't read?
    • Umm, guh? This chip is an experimental chip and won't see the light of day for years. The FSB doesn't have years left. Ergo, this is a non sequitur - FSB has nothing to do with this chip.
    • by julesh ( 229690 )
      RTFA. The cores have onboard RAM. There isn't an FSB, only a network between cores. They're working on a 3D interconnect to stack memory on top of the cores (i.e., 80 FSBs -- or perhaps TSBs).
  • by UPZ ( 947916 )
    and here i was about to buy a core2duo p.o.s
  • Yep. The only way to really use this effectively is to load it up with lots of bloatware. Imagine the tons of ads one can finally get with this type of CPU! would seriously love this.

    People still effectively use processing power equivelant to that of an 800mhz Pentium 3 for basic stuff (and I'm just talking about Word processing, email, internet, no gaming) on average. Why would someone need a quad core CPU, and a crappy videocard just for surfing the net, typing, etc?

    In reality, that is wh
    • Yes, people will really tolerate random popups and keyloggers that steal passwords/credit card information. What?
    • by Anonymous Coward
      This clearly isn't for CPU's. It's for building GPU's and more importantly for intel get a part of the huge growing market demand for general purpose programming on GPU's. We'll have to call them something other than GPU's in 5-10 years as they'll do all sorts of other jobs too.

      IBM saw this coming and went with the Cell, AMD saw this coming and bought ATi, NVidia already has a card that has all these shader units. Intel would be stupid not to respond. They've already admitted a discrete GPU part is on t
  • Narrow Minded (Score:4, Insightful)

    by Deltronica ( 1063232 ) on Monday February 12, 2007 @11:51AM (#17983364) Journal
    Many comments on this post are centered around the processor's use as a personal computing solution. There is much more to computing than PCs! When viewed alongside specialized programming technology, bioinformatics, neurology, and psychology, this (rather large) leap in processing power brings AI to yet another level, and continues the law of accelerated returns. I'm not saying "oh wow now we can have human-like AI", I'm just saying that the ability to process 1.8 Tflops is nothing to scoff. Personal computing is inane and almost moot when compared to the other applications that new processors may pave the way for. Know your facts, but use your imagination.
  • by Dekortage ( 697532 ) on Monday February 12, 2007 @11:58AM (#17983450) Homepage

    They've already allocated 40 cores to the RIAA and MPAA for DRM processing, 30 cores to NSA/Homeland Security surveillance of all your computing activities, and 6 cores to combat spam and phishing. In the end, there is no net gain in performance over today's processors. Sorry.

    (tongue firmly planted in cheek)

    • Sounds more like a Lord of the Rings parallel

      We need the One Core! Kudos for someone to write the poem from LoTR. :-)

  • by nadamucho ( 1063238 ) on Monday February 12, 2007 @12:04PM (#17983530)
    Looks like Intel finally put the "80" in 80x86.
  • I was reading an article about this on the BBC []
    From that article:

    There are already specialist chips with multiple cores - such as those used in router hardware and graphics cards - but Dr Mark Bull, at the Edinburgh Parallel Computing Centre, said multi-core chips were forcing a sea-change in the programming of desktop applications.

    How is this done?
    Take an RTS game like Starcraft for example.
    Would there be one core assigned to AI path-finding, one for collisi

    • by Splab ( 574204 )
      Easy, you use something like CSP where just about everything is a thread.
    • There exists a moderately sized computing world outside of games. 80 cores, as you have pointed out, are clearly not directed towards gamers, or even personal computing at the moment. I would personally love one of these for my simulations, but I can use up absolutely any number of cores without too much trouble. If you want to extend it to games, it isn't very hard to imagine. As someone else mentioned, with a handful of cores you could probably do real-time ray tracing, which is naively parallel and can e
    • with compilers/tools meant for programming it. before virtual memory programmers had to program for their machine's RAM size and manually manage their memory using "overlays" (or so i've read), but now this concept seems horrid to younger programmers. a generation from now, programmers will read about how computers used to only have one logical core and think it ludicrous.

      my uninformed, amateur guess is that functional languages will become more popular for programming massively multi-core machines (this co
    • by S3D ( 745318 )

      I'm not the best programmer in the world, but how the heck would you utilize 80 cores?

      OpenMP hide multithreading from developer and make parallelization completly transparent. Couple of OpenMP instructions can parallelize complex loop, witn no effort form developer at all. That is especially easy in physical simulation and AI. []
    • by radish ( 98371 )
      Thread based programming really isn't that hard, particularly where you have a problem space which can be split up into discreet chunks of work. Example - a photoshop blur filter. Just divide the image up into (overlapping) chunks and blur each piece on a different thread. Another example - digital audio. Put each VST instrument on it's own thread. Once your apps are well threaded (and in many cases they already are) you can simply rely on the OS to schedule them over how ever many cores are available. For
  • I want something that will do 1.8 trillion integer operations per second (single threaded). This simulation is taking 5 hours per run with this A64 3200+. Gimme give me 1.8TIOPs and I'll be listening.
  • You can't say this is useless, and support nVidia or ATI's stream computing, they are the same thing.

    This is the future of CPUs: everyone is doing it, and with GFX manufacturers heading down this path, it proves to be a very interesting future.

    • Sadly, this is the only possible future for CPUs. Massively parallel single cores with support for symmetric multi-threading will replace complex cores with out-of-order execution, it's just a matter of time.

      Three resons why we're reversing a 15-year trend toward more complex CPUs:

      1. Single-thread performance using current processes and clock speeds is "good enough" for most desktop applications, even when you take away all the out-of-order execution goodies.

      2. Programmers are beginning to understand SMT,
  • Others have built large scale parallelism in the past such as Thinking Machines and Masspar. They were not fully general CPUs, i.e. floating point. Plus the companies could only develop new generations on a 3-5 year time scale, so the general purpose workstations and clusters almost caught up by then. Having a "major" back large scale parallelism may finally lift the curse.
    • or maybe its actually really hard, or arguably impossible, to use data-parallel machines for general purpose computing?
  • One year later, and /. has updated their Intel logo to the new one?
  • please ue a power of two for the number of cores. Base 10 sucks.

    Sincerely, /. nerds
  • Did anyone else see that?

    "Even more impressive, this chip is able to achieve incredibly high clock speeds on modest power usage. Running on a 1.0v current at 110 degrees C the tile maximum frequency is 3.13 GHz while at 1.2v the tiles can run at 4.0 GHz."

    That would be about 250f, would peltier coolers be mandatory?
  • As the article points out, this is a VLIW (Very Long Instruction Word) design -- in effect, each instruction word will be broken up into chunks, with a chunk going to each processor. This means that you can end up with some bizarre situations -- what happens, for example, if one processor needs to jump to one location in memory and the other 79 don't? Effectively, your compiler would need to be able to realize this, and have the instructions at that memory location for the 79 processors be the same. (In
    • The biggest problem that this technology has is that it is expensive when compared with a compute cluster, which can scale easily and can be more easily programmed. The main time the cluster won't do better are the instances where each core needs results from other cores so frequently that the overhead in message passing is too high.
      Surely you're joking? A single box constructed with this processor will be vastly less expensive than a compute cluster. Even modern quad core DPs would still require 10 node
    • Tell me, what does 2+2 add up to on your world? VLIW is not usually used across cores, it tends to be used to exploit parallelism within a core. Assuming a 3Ghz clock, and 1 Tflop throughput - we are averaging 333 operations per cycle. That's a little over 4 operations per core, per clock. Guess where the VLIW is going to be?

      Given your other vague ballsup in understanding where the tradeoff between a tightly couple array like this, and a loosely coupled cluster - how is the second year of your degree?
      • by cfulmer ( 3166 )
        Yeah, so I screwed up a bit: Intel's new chip, which has a somewhat similar architecture (bunch of less capable units working in parallel), does not actually run off a VLIW. However, my core points still hold: (1) there is a class of applications which will run well on this sort of processor, but the majority won't, (2) effective use of the processor will either need re-coding of applications in a parallel-conscious language or a very smart compiler, and (3) for many applications, a cluster of general-p
        • Your points are generally accepted wisdom in the parallel community. Each generation of hardware gives us a chance to argue over them again. There must be a parallel-processing equivalent to the graphic "wheel of invention" that describes this phenomena.

          The dig about being a 2nd-year student was just intended to get a rise, I guess that it worked. ;^)

          A lot of people on this discussion are seeing this as a branch away from multi-core x86. I don't think that would be Intel or AMD's strategy. When the fab tech
  • Now that's a summary I'm willing to read- Bravo editors!!!111
  • I have one too... It does 2 Tflops on the same amount of power. As long as all of the opcodes are "NOOP".
  • by petrus4 ( 213815 ) on Tuesday February 13, 2007 @04:50AM (#17994694) Homepage Journal a version of the Sims 2 rewritten so that the Sims have a much greater degree of genuine autonomy, and for said version to be run without human intervention (and recorded) for a period of months or years on a multiple TFlop system. If the environment was made a lot more detailed than it is in the retail version of the game, and if the Sims were given somewhat more capacity for learning than what they've currently got, something tells me the results of such an experiment might be extremely interesting, given enough time.

This process can check if this value is zero, and if it is, it does something child-like. -- Forbes Burkowski, CS 454, University of Washington