Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×

MIT Startup Unveils New 64-Core CPU 213

single-threaded writes "Tilera, a startup out of MIT, has announced that it is shipping a 64-core CPU. Called the TILE64, the CPU is fabbed on a 90nm process and is clocked at anywhere from 600MHz to 900MHz. 'What will make or break Tilera is not how many peak theoretical operations per second it's capable of (Tilera claims 192 billion 32-bit ops/sec), nor how energy-efficient its mesh network is, but how easy it is for programmers to extract performance from the device. That's the critical piece of TILE64's launch story that's missing right now, and it's what I'll keep an eye out for as I watch this product make its way in the market. Though there are any number of questions about this product that remain to be answered, one thing is for certain: TILE64 has indeed brought us into the era of 64 general-purpose, mesh-networked processor cores on a single chip, and that's a major milestone.'"
This discussion has been archived. No new comments can be posted.

MIT Startup Unveils New 64-Core CPU

Comments Filter:
  • Oblig... (Score:5, Funny)

    by Bentov ( 993323 ) on Monday August 20, 2007 @04:00PM (#20297161)
    No one will ever need more than 64 cores.
    • Re: (Score:2, Funny)

      Unless you need more than four pieces of toast.
    • No one will ever need more than 64 cores.

      Funny :-) The truth though is that probably no one will ever be able to use more than 64 cores ...

      Not with the current state of crappy software anyway.

      Rich.

      • That's the whole reason why the quote is funny. And the reason that everyone laughs at the original quote (even though it's not real, since nobody who knows anything about computers would ever state that anything is enough).

        But thank you for pointing out why it's funny.

        And let me get a "They'll use more than 64 to get Vista running properly!" comment in before I go.
      • by Tribbin ( 565963 )
        Indeed; for personal computers there is hardly ever need for more than, say, two cores.

        For the average slashdotter though; if you have 512 cores, your fingers begin to tickle and you get excited and you will find a way to use them all.
        • Indeed; for personal computers there is hardly ever need for more than, say, two cores.

          Right now. With the software available to you. Mostly because programmers wouldn't get very far trying to make you need things that aren't available to you.

          On the other hand, programmers tend to find a way to make use of whatever hardware is available to them. It's sort of funny watching some of them bitch and moan about how hard multi-core programming is, but once 8+ core systems are common you can be sure that your co

    • Not 64.. (Score:2, Funny)

      by Namlak ( 850746 )
      Actually, 42 cores is the answer.
  • Instruction Set (Score:5, Insightful)

    by Lally Singh ( 3427 ) on Monday August 20, 2007 @04:03PM (#20297199) Journal
    FTA: It's a "MIPS-like ISA with a few important and peculiar features"

    I'll be interested to see what they're going to do about making it easier to program. Wire delay's going to be exposed as hops on the on-chip network. IMHO, the toolchain side's far more interesting to me than shoving a bunch of cores together on an on-die network....

    Assuming they did anything interesting on the toolchain side.
    • Re:Instruction Set (Score:4, Informative)

      by evanbd ( 210358 ) on Monday August 20, 2007 @04:14PM (#20297333)

      Also FTA: "I'm due to talk to the head of Tilera's software team, which is actually larger than the company's hardware team."

      I'll be very curious what their development toolchain ends up looking like, but it seems clear they understand the issue.

      • by Mex ( 191941 ) on Monday August 20, 2007 @04:55PM (#20297707)
        ""I'm due to talk to the head of Tilera's software team, which is actually larger than the company's hardware team.""

        He must be a really fat guy!
      • by BerntB ( 584621 )

        Doesn't this architecture looks suitable for the Connection Machine languages? They had something similar in CM-5, I think? (Anyone even older than me around?)

      • I'll be very curious what their development toolchain ends up looking like, but it seems clear they understand the issue. You can see that on their website. There's a PDF showing the specs there, but it looks like it'll be useful straight out of the box.

        Tilera's Multicore Development Environment (MDE) is a complete, standards-based multicore programming solution that enables developers to take full advantage of the parallel processing potential of the Tile Processor architecture. Old multicore models required all operations to be done in a core-by-core fashion, making it impossible to efficiently program, debug or profile any more than a handful of cores. The great innovation of Tilera's MDE suite is that it enables developers to move to ever-larger and more complex multicore applications in an easy, predictable way.

    • Is not if it will run Linux (it will), but if it will run windows? CE does not count.
    • Re: (Score:2, Interesting)

      You're hoping they're doing something to make it easier to program, and I doubt they are. The choke point is rapidly becoming scheduling rather than number of cores.

      This is the same problem we've been working with on clusters forever...How do you tune and load balance the jobs to the point where you're getting the most out of your hardware, and nothing is sitting idle while other parts of the system are running at 100%? What do you do when the task is already reduced to the simplest level and there is no be
      • by deadline ( 14171 )
        Well I have not solved the problem just yet either, but I thought about it quite a bit [clustermonkey.net]. I looked at it mostly from a HPC cluster standpoint, but problem is still the same -- software!
      • Re:Instruction Set (Score:4, Interesting)

        by ultranova ( 717540 ) on Monday August 20, 2007 @05:02PM (#20297783)

        You're hoping they're doing something to make it easier to program, and I doubt they are. The choke point is rapidly becoming scheduling rather than number of cores.

        The solution, of course, is to move away from the imperative programming model to dataflow [wikipedia.org] or functional [wikipedia.org] one. That way the compiler can automatically parallelize the task, instead of the programmer having to do so manually.

      • by jd ( 1658 )
        If you have a mini-pipeline running to each processing element, and an instruction ID, then scheduling issues may be reduced somewhat as you can then not only execute whole instructions out of sequence, but fragments of instruction out of sequence between instructions. The smaller the atomic unit you are working with, the less effort is required to achieve equal or better packing.

        I also prefer the idea of virtual cores over physical cores for the same reason. You can "steal" unused processing elements fro

        • Re:Instruction Set (Score:4, Interesting)

          by networkBoy ( 774728 ) on Monday August 20, 2007 @05:53PM (#20298221) Journal

          The best way, IMHO, is to build a two-layer chip - one layer being RAM, the other being the CPU cores
          Both those require transistors. You can not stack transistors with any current process technology, physics gets in the way.
          A chip is basically built as follows

          metal
          poly
          metal
          poly
          Si
          Where the poly is the insulator and metal is the same as traces on a PCB. Just like you can not place components in the middle of a PCB you can not place transistors on top of the metal, it would require a second silicon layer that you could dope transistors into.
          While there are some technologies (SOI for example) that may allow this in theory, you start to run into other issues like trying to punch through the insulator in specific areas and with high precision (neither of which is easy), heat dissipation (transistors are transistors, and switching produces heat, doesn't matter if it's an ALU or a SRAM). And finally before someone suggests using the other side of the wafer, how do you connect the two sides? A wafer is *very* thick in the scale we are discussing. It would be like mining a hole through the earth.
          More useful would perhaps be distributing L0 cache (register memory) a little more liberally in key areas of the processor, but then addressing gets in the way. In theory having a MCM (multi chip module) with Cache - Processor - Cache so there is ample L3 cache running at core/4 clock may help, but costs get prohibitive.

          There is no really good solution to moving data around once you start getting to these kinds of density. Eventually wire delay may be the limiting factor to CPU throughput.
          -nB
          • Re:Instruction Set (Score:4, Informative)

            by imgod2u ( 812837 ) on Monday August 20, 2007 @08:31PM (#20299363) Homepage
            This has been done. There was an article a while back about IBM being able to drill holes through their wafer to produce an interconnect to a second wafer on the bottom.

            Intel did this a swell and redesigned the Pentium 4 on it.

            The old method of bonding two wafers also works. Smart censors, for instance, bonds a photodetector material (a semiconductor like InGaAs or InSb) onto the top of a cmos chip. The bonding was very expensive, of course, but it is definitely possible to grow a semiconductor on top of existing metal/polysilicon.
    • by LWATCDR ( 28044 )
      What I find interesting is that they are using MIPS instead of ARM. It does seem to have on die memory controlers so that leads me to wonder if they are using hyper-transport as well?
      • It's "MIPS-derived", probably meaning they didn't pay any licensing fees. MIPS is the simplest ISA, and lots of networking equipment is based on MIPS. The question is not "why MIPS?" but "why not MIPS?"

        Tilera doesn't use HyperTransport; except for AMD most SoC vendors are using PCI Express for I/O.
        • by LWATCDR ( 28044 )
          I would say why ARM is because it is small and actively being developed. Why MIPS? Well you named them. It is simple, inexpensive, well known, and it is a good ISA.
          I thought that HyperTransport was faster than PCI Express and also freely available I would think that with 64 cores that memory speed would be a bottleneck. It would also allow the Tilera to fit into an AMD socket. It is an interesting critter no matter how you look at it.
    • Re:Instruction Set (Score:5, Informative)

      by dfedfe ( 980539 ) on Monday August 20, 2007 @04:46PM (#20297573)
      FWIW:

      ""If you have an application written for any multi-core or single processor architecture that's written to work with Linux, you can take it, compile it and have it running on our chip in minutes," he said. "Now, if you want to ratchet up the performance, we provide libraries and interface mechanisms that customers can use to tune code."" from here [theregister.co.uk]
      • by bcmm ( 768152 )

        you can take it, compile it and have it running on our chip in minutes
        Minutes? I guess it must be really really fast to compile stuff on those things. Unless, for example, Mozilla Seamonkey is not "an application".
        • Unless, for example, Mozilla Seamonkey is not "an application".


          Seamonkey is, rather expressly, an integrated suite of applications, not "an application".
      • Re: (Score:3, Informative)

        by imgod2u ( 812837 )
        Considering these things are MIPS cores, having C code compile to it wouldn't be hard at all I would say. It's utilizing the mesh network that's the problem.

        Until I see some results of dynamically-compiled C code that runs really fast on this thing, I don't see it offering better solutions than, say, an FPGA. The exception would be if this was much lower-powered.

        It's not theoretically impossible to do. Instead of treating it like a CPU, treat it like a network with micro-ops treated like packets. Run ea
    • Re: (Score:3, Insightful)

      by timeOday ( 582209 )

      I'll be interested to see what they're going to do about making it easier to program... Assuming they did anything interesting on the toolchain side.

      Contrary to the summary and your remark, I'm not sure it's Tile64's problem to bring parallel programming to the masses. First, because many-core chips are already useful (and present no special difficulties) for servers that handle many simultaneous connections - in other words, reducing the space and electricity requirements of server farms. That's a sign

  • by dfedfe ( 980539 )
    Only $435 for 10,000 units. Are there 9,999 people on here who want to go in on that?
  • by Dachannien ( 617929 ) on Monday August 20, 2007 @04:06PM (#20297247)
    Fry: If only they'd built it with 6001 cores! When will they ever learn!

  • How do I overclock it?
    • Actually you have to over-core it by adding cores.
    • by Nahor ( 41537 )

      How do I overclock it?

      Easy:
      1. At the boot, type "F2" or "DEL" to go to the BIOS
      2. Go to "Advanced settings"
      3. Select CPU Core #1
      4. Change the clock speed for this CPU
      5. Press "F10" to save
      6. Reboot
      7. Run Prime95 to test that you didn't overclock too much.
      8. Reboot
      [... Repeat 63 times for each core ...]
      513. Voila!

      Of course, overclocking one core will affect the cores around it. So you may have to reduce the overclocking of some cores to increase the clock speed of the current core. Make sure to re-run Prim95

  • obligatory (Score:2, Redundant)

    Boy, I could really go for a Beowulf Cluster of those...
  • Key information missing from the article:

    1. Die size: How big is it?
    2. How many watts of power does it consume?
    3. What is the heat dissipation?
    4. What is the floating point performance?



    Without those bits of information, it's impossible to guage exactly who might night this chip, and how successful it might be.

    • by niceone ( 992278 ) *
      What is the floating point performance?

      Judging from the applications they mention (networking / video stuff) I'm guessing it doesn't have much floating point performance.
    • by trolltalk.com ( 1108067 ) on Monday August 20, 2007 @04:23PM (#20297419) Homepage Journal

      The watts isn't missing:

      TFA says its between 175 and 300 milliwatts per core - do the math. 12 to 19 watts. They're targetting the embedded market (and with those low power consumption figures, I think a super laptop would be a no-brainer).

      • Can it shut off cores when they're not in use? When you're not doing much, you could throttle back to a single core, and at only 0.3W that's much lower power than most ARM chips. When the load goes up, turn on more as they're needed.
      • Re: (Score:3, Informative)

        by imgod2u ( 812837 )
        Those are *not* very impressive figures for the embedded market. I imagined the whole 64-core chip would run below 100mW. If we're talking 12 to 19 watts for the chip, it is a beast in embedded terms. For reference, an SoC with 4 ARM cores, all of the peripherals that that thing has plus dedicated DSP/FPU units would still be under 4W.

        FPGA's (particularly ones from Xilinx) that offer similar logic horsepower (assuming you had a digital designer to write your VHDL for your) for less than 500mW.

        The latest
    • Re: (Score:3, Informative)

      by WindBourne ( 631190 )
      here is a bit [tilera.com]
    • Re: (Score:3, Funny)

      by HerculesMO ( 693085 )
      If we are judging based off of current generation processors, I believe the size of the chip will be about 3 feet squared.

      Warning: Sarcasm above may cause irritation of skin and explosion of monitor.
    • 1. Die size: How big is it?
      2. How many watts of power does it consume?
      3. What is the heat dissipation?
      4. What is the floating point performance?

      1. Does it matter? It's useful for computer architects to know for comparison but doesn't matter for the end user. I'm curious, too, but that can wait. Doesn't matter for system designers even.

      2. They list 170-300 mW/core, but that's not clear as to what the base power is for the peripherals and routers. Is that (900 mhz) 300 mW * 64 ( about 20 W ) for the whole

  • I'm ready for it (Score:2, Interesting)

    by Anonymous Coward
    On my laptop right now:

    > ps aux | wc -l
    281

    Of course not all those processes are in runnable state. On the other hand, many of those processes have multiple threads. A typical Java Swing GUI app may have a dozen threads, for example. A web server process can easily have dozens of runnable threads. Software is going to take a little bit of catching up, but nothing huge.
    • by Bryan Ischo ( 893 ) * on Monday August 20, 2007 @04:39PM (#20297537) Homepage
      Just as your system has only a few processes that want to be scheduled simultaneously (and so your observation that "not all of those processes are in [a] runnable state" is correct), those Java Swing applications you are talking about very rarely have more than a thread or two wanting to do work at the same time. The web server is a better example of concurrent execution but those are most often I/O limited as much as CPU limited, and in the vast majority of cases the bottleneck is not the number of threads that can execute concurrently.

      It's very hard to take advantage of multiple cores because very often, there isn't more than one thing for a program to be doing at the same time, and for most desktop users, there are rarely more than 1 or 2 programs running actively at a time. Many code paths are not explicitly parallelizable, and many more are parallelizable but not easily so. Just as clock speed is not the holy grail of processor performance, core count isn't either.
      • for most desktop users, there are rarely more than 1 or 2 programs running actively at a time.
        Yeah. I finally thought I had an application where the new dual-core amd system I built could really exploit both processors: encoding CDs in MP3 format. I was ripping a whole CD to disk, then having one cpu encode the even-numbered tracks and the other all the odds. It worked great, until I realized that there was no reason to leave the cpus idle until the whole CD had been read. I rewrote my script to start en

        • Often "less and less efficient code" is cheaper code, in time and money, to write. If your processor can make up the difference, why not speed up development process by writig less efficient code? Do you really want to pay more for programs that don't run noticeably faster? Or have fewer choices of software to use because more developers were devoted to optimizing unnecessarily than to creating new products?

          Note that there is a difference between "sloppy" and "inefficient". You lump them together, but I
          • Often "less and less efficient code" is cheaper code, in time and money, to write.
            Hmm...I don't see the real-world evidence to support that statement. The first word processor I ever used, ca. 1980, was Electric Pencil, which was written by one person, and was very snappy on a 1.8 MHz Z-80 with an 8-bit cpu. Today the best option on my linux box seems to be OpenOffice, which was written by a large team of programmers, and runs dog-slow on a 2200 MHz dual-code amd x64. It seems to me that OpenOffice was o

            • But the fact is that you're using that 2200 Mhz dual-core processor with a modern operating system and applications instead of the 1.8 Mhz Z-80 with Electric Pencil. Why? If the circa 1980 computing experience was better, why not just fire up your old Z-80 and throw your AMD in the trash? Heck you could probably find a Z-80 emulator that would let you run all of your old Z-80 programs. Just run it full-screen and it would be just like you were back in 1980.

              The fact is that you are also getting orders of
              • But the fact is that you're using that 2200 Mhz dual-core processor with a modern operating system and applications instead of the 1.8 Mhz Z-80 with Electric Pencil. Why? If the circa 1980 computing experience was better, why not just fire up your old Z-80 and throw your AMD in the trash?
                Well, for one thing that machine died a long time ago. Also, the peripherals on those computers were the quality you'd expect from a crackerjack box :-)

                But anyway, I honestly think that software quality has gone down

    • Actually, a swing app should have no more threads than a non-swing app (except where the size of the app might be bigger).

      Your heavily threaded apps might even tend to be headless (think web servers).

      Swing itself is restricted to using exactly 1 thread, if you call swing from more than one thread, you're doing it wrong.

      -1 offtopic -1 pedantic
  • Rumored... (Score:5, Funny)

    by SeanMon ( 929653 ) on Monday August 20, 2007 @04:20PM (#20297373) Homepage Journal
    It's rumored to be able to run 16 whole instances of Vista simultaneously!*

    *Required 32 GB of RAM not included.
    • i'm not sure i can click Allow on 16 Vista security popups as fast as they repop, that sounds like whack-a-mole with a mouse
  • Instruction set? (Score:3, Insightful)

    by Eponymous Bastard ( 1143615 ) on Monday August 20, 2007 @04:21PM (#20297383)
    I can't believe startups haven't figured out that incompatible chips aren't what the market wants. They're either going to sell directly to "supercomputer" makers or just crash and burn.

    They'll probably market running Java as a strong point.

    (Then again, does it run Linux?)
    • No. Apparently this thing was soley designed for embedded processors in video equipment.

      FTFA (page 2) [arstechnica.com]:

      Each TILE64 processor is capable of encoding two simultaneous streams of H.264 video, and over ten streams of broadcast-quality high definition video.

      That would be a boon for anyone that wants to stream live directly from the DV camera or video rackmount gear.

    • Re: (Score:3, Insightful)

      What's the instruction set of your router? Your TV? Why does it matter?
  • wow. (Score:3, Funny)

    by paulbd ( 118132 ) on Monday August 20, 2007 @04:21PM (#20297391) Homepage
    it might even be as successful as the similarly revolutionary Kendall Square Research machine, just down the road from MIT.
    i wouldn't hold my breath.
  • Tequila128 (Score:4, Funny)

    by crea5e ( 590098 ) on Monday August 20, 2007 @04:21PM (#20297395)
    In related news, Boston College has also released a processor of their own.

    The Tequila128. Free copy of virtual beer pong included.

  • But does it... (Score:5, Informative)

    by niceone ( 992278 ) * on Monday August 20, 2007 @04:23PM (#20297415) Journal
    well, yes it does run Linux - full SMP 2.6 according to the blurb on their site.
    • Re:But does it... (Score:5, Informative)

      by Eponymous Bastard ( 1143615 ) on Monday August 20, 2007 @04:49PM (#20297629)
      One thing the blurb doesn't make clear is that this is not a workstation CPU. It's designed for embedded systems and system on a chip applications. They mention video compression as an example.

      If you look at their block diagram this looks more like an FPGA-on-drugs than a CPU.

      The individual blocks are probably programmed with GCC, since it should be trivial to port it to a MIPS-like architecture. I wonder if the interconnect uses a VHDL type language or if they rely on their weird cache to build efficient shared memory.

      Either way, it looks like you have to keep in mind the architecture while designing your software. I doubt they can build a compiler that can manage the division of labor.

      Unlike a typical multicore design you wouldn't use this to parallelize a multithreaded application or a multiprocess workload. The center processors will have a very different latency characteristic than the edge ones, and you want the parts that interact with the network to be on the points adjacent to the controllers, for example.

      So it should work great for an especially designed system, but not so great as a general purpose CPU
      • I was going to post the same sentiment but I wanted to read through some other replies first to avoid redundancy. This is *not* a general purpose CPU. It looks like it's targeted more towards high end switches/routers and things like advanced digital video applications (perhaps HD set tops or game consoles?).
  • Tilera MDE (Score:3, Informative)

    by MrMunkey ( 1039894 ) on Monday August 20, 2007 @04:50PM (#20297631) Homepage
    For those of you wondering about what their software will be like, here's some info on their Multicore Development Environment (MDE). http://www.tilera.com/products/software.php [tilera.com] It's not the most info in the world, but it's a start.
  • The T1 was already doing 32, and the new T2 is supporting 256 in a single chip. Just wondering why "TILE64 has indeed brought us into the era of 64 general-purpose, mesh-networked processor cores on a single chip, and that's a major milestone", when the mile marker is already at 256?
    • Re: (Score:3, Informative)

      by Slashcrap ( 869349 )
      The T1 was already doing 32, and the new T2 is supporting 256 in a single chip. Just wondering why "TILE64 has indeed brought us into the era of 64 general-purpose, mesh-networked processor cores on a single chip, and that's a major milestone", when the mile marker is already at 256?

      Because this has 64 cores as opposed to 8 cores on either the T1 or T2?

      Because the total number of threads supported by an 8 core T2 is 64 and not 256 as you wrote above?
    • The UltraSPARC T2 (Niaraga 2) has 8 cores and 64 threads, so Tilera has more cores, more functional units, and an equal number of threads.
    • by rbanffy ( 584143 )
      I think neither T1 nor T2 are mesh networked. IIRC, all cores share a single internal bus. The shared bus design is easier to program as it resembles more traditional SMP architectures. This Tilera beast breaks away from this idea and implements a more supercomputer-like design.

      I wish them luck. As I said earlier, this x86-dominated desktop world is boring.
    • There's a notable difference between threads and cores. The T2 has 8 cores, each working 8 threads, totaling 64 (not 256). http://en.wikipedia.org/wiki/UltraSPARC_T2 [wikipedia.org] While this isn't a perfect comparison, consider a P4 with HT against a true Dual Core CPU.
    • The T2 is running 64 threads, not 256. It is running eight threads per core, on eight cores. Each set of eight threads on a T2 shares a set of execution units, each thread on the TILE64 has its own set.
  • by John Sokol ( 109591 ) on Monday August 20, 2007 @04:59PM (#20297751) Homepage Journal

    It's was called Enumera www.enumera.com

    I started to work with Chuck Moore, the author of the FORTH Language on a 7X7 array of very fast small processors.

    From at talk I did, February 16, 2001
    From http://www.dnull.com/~sokol/amorp/emtalk.ppt [dnull.com]

    On this size Chip a 7x7 array (49 CPU's) with ram could be
    build. Co-processors could also be added.
    Each CPU's would be operating at 2400 MIPS x 49 for a total of 117 Billion operations per second.
    The power consumption would be 1 watt 1.8 Volts a 500 mA.
    With this level of computing power new applications that were unthinkable before, now become possible.
    Also mention earlier on Slashdot:
    http://developers.slashdot.org/comments.pl?sid=138 584&threshold=0&commentsort=0&mode=thread&cid=1160 0799 [slashdot.org]

    And earlier here:
    http://www.colorforth.com/ [colorforth.com] 25x Multicomputer Chip

    This eventually became IntellaSys after Enumera failed.

    IntellaSys CTO Chuck Moore to Present at In-Stat Spring Processor Forum; Scalable Embedded Array Platform for Implementing Asynchronous, Scalable Multicore Solutions Using Elegant VentureForth Programming to Be Discussed in Detail
    http://www.intellasys.net/products/24c18/SEAforth- 24A-3.pdf [intellasys.net]
    http://www.findarticles.com/p/articles/mi_m0EIN/is _2005_Oct_24/ai_n15730157 [findarticles.com]
    http://www.findarticles.com/p/articles/mi_m0EIN/is _2006_May_1/ai_n16135032 [findarticles.com]

    Also for older info see:
    Specifically look at the P21 / I21/ F21 chips...

    http://www.enumera.com/chip/ [enumera.com]
    http://www.ultratechnology.com/ml0.htm [ultratechnology.com]
    http://www.ultratechnology.com/f21.html#f21 [ultratechnology.com]
    http://www.ultratechnology.com/store.htm#stamp [ultratechnology.com]
    http://www.ultratechnology.com/cowboys.html#cm [ultratechnology.com]

    • by perkr ( 626584 )

      Seriously, what was the point with your post?

      That there are earlier patents on related technology? That you want credit for whatever they are doing? It would help if you motivate a post like that.

      • Re: (Score:3, Interesting)

        by John Sokol ( 109591 )
        I an not sure really what the point is, I guess I am just venting out of frustration. Also adding some information to anyone interested similar work I had done, showing this isn't a new idea.

        I put $100,000 Cash and almost 2 years worth of work into this and got nothing, no one was even interested.
        But then I see a Bunch of MIT weenies do it and they get all kinds of attention as something new and revolutionary 6 1/2 years later.

        There is also a real chance they took the idea right off my web site or slashdot
        • by suv4x4 ( 956391 ) on Monday August 20, 2007 @06:30PM (#20298517)
          I an not sure really what the point is, I guess I am just venting out of frustration. Also adding some information to anyone interested similar work I had done, showing this isn't a new idea.

          I put $100,000 Cash and almost 2 years worth of work into this and got nothing, no one was even interested.


          I'm not sure why the frustration. I'm sure multi-core was not just your original idea. If you're in the industry you know that:

          1. IT is rich on ideas, poor on implementation.
          2. Marketing a product is just as (if not more) important than making a product.
          3. Most businesses fail in the first 5 years. And this one may be no exception. They didn't exactly enjoy massive success just yet. They got few crappy articles and landed Slashdot. Kind of hard for a hardware company to cash in on that alone.

          There design really looks like it was lifted straight off my paper. So I guess at least I am exposing some plagiarisms.

          You don't expose plagiarism by venting frustration on Slashdot: where are your patents. How's there guarantee you're the originator, and how's there guarantee they *stole* your work versus reinvent it independently, which happens often with technology that's in a boom (i.e. multi-core designs). There's a reason the patent system exists, forget the grab you read here about patents on Slashdot.
    • Re: (Score:2, Insightful)

      by pmadden ( 209229 )
      Of course, this was also Thinking Machines idea a bit earlier. http://en.wikipedia.org/wiki/Thinking_Machines [wikipedia.org]
      It's good to see that MIT has perfected the technology.
      • Build a machine with lots of processors.
      • Get investors to buy into the hair-brained scheme.
      • ??? (Mention that programming is a problem to be solved shortly.)
      • Skip town with the cash (Profit!).

      Hmmm. I think I'm missing something about a beowulf cluster, or maybe underpants.
      It's scary how little history people know. Programming for multi

      • by John Sokol ( 109591 ) on Monday August 20, 2007 @09:09PM (#20299679) Homepage Journal
        Parallel processors on a single die (chip) is very different from Thinking Machines & beowulf clusters.

        Up till now there were only 2 types of Parallel processing.

        1.) loosely coupled. Thinking Machines & beowulf clusters for example are using this, these are interconnected with Ethernet or some other Network medium and send messages back and forth.

        2.) Tightly coupled, this is SMP, NUMA, SNOOPY, basically shared memory system where each processor shares the same global memory space.

        Each requires very different programming strategies and are limited to certain types of problems.

        There is also a third form that is lesser know. This systolic arrays. An example of this is TimeLogic, and many DOD type projects.
        This is usually done with a bunch of FPGA's and the math computations are done as a series of hardware pipelines without any CPU.

        With the parallel core processor it's possible to make it like an SMP (share memory) type system, but you really get hammer with the memory bottleneck so after about 4 CPU's you don't really gain much.

        What I had proposed with doing systolic array type of processing but with Simple but fast CPU's on one chip.
        They would be connected with CPU registers that would pass data directly from one CPU to the next.
        It's design would allow super tight coupling between each processor, so a programming problem wouldn't need to process a buffer at a time but could tackle problems that can't normally be broken up into parallel operations. For example a bignum math operation like multiplying 2 number that are 1024 bits long. Or large FFT, fast DVT, or matrix operations where each cpu could process part of a single operation that must be done serially, and can not be done using traditional parallel processing.

        Specifically my interest was in video compression and image processing in real time. This is where DCT, motion vector searches Huffman coding and other operations that don't parallelize well would really get a boost using this type of processor.

  • It mentions 600-900mhz, is that per core or per total CPU? While 64 900mhz cores sounds nice, 900mhz made up of 64 14mhz cores is kinda pointless. That would be like cascading a bunch of PIC chips together, so I'm guessing it's the first. Also what kind of architecture is it? Are there spec manuals available to people can start porting gcc, libc, and eventually the linux kernel to it.

    Seems interesting, would be nice if it comes out at an affordable price.

    • It mentions 600-900mhz, is that per core or per total CPU? While 64 900mhz cores sounds nice, 900mhz made up of 64 14mhz cores is kinda pointless.

      I think you're confused. Never once in the history of computing has the frequency been a factor of the number of CPUs. A CPU's frequency is the measurement of the number of times the CPU's clock or timer asserts in a second. In an SMP environment, all CPUs operate on a synchronized clock, and therefore operate at the same clock speed. All 64 cores operate at

    • by dbIII ( 701233 )
      It is the clock speed. It means that 600 million times per second another instruction can be processed wherever something is waiting for a clock pulse.

      Video is a trivial thing to run in parallel - doing the same transform on the next 64 frames is an ideal application. Geophysics would be another big application where it's not hard to split one job into a lot of parallel jobs, or finite element analysis (engineering design) or a lot of other numerical applications.

  • by rbanffy ( 584143 ) on Monday August 20, 2007 @05:21PM (#20297963) Homepage Journal
    "'What will make or break Tilera is not how many peak theoretical operations per second it's capable of (Tilera claims 192 billion 32-bit ops/sec), nor how energy-efficient its mesh network is, but how easy it is for programmers to extract performance from the device. That's the critical piece of TILE64's launch story that's missing right now"

    Build a USD1000 desktop workstation, port Debian Linux to run on it and let the geeks out there adopt it.

    There is no better way to explore a device's capabilities than to let the market do it.

    I want one for myself. I am tired of the x86 architecture.
  • When I sit down to play World of Warcraft, what can I expect?
    • Re: (Score:2, Funny)

      by dm0527 ( 975468 )
      The same old tired, boring grind and stupid, inane and childish behavior by your fellow gamers?
  • Tilera will succeed because the packet pushers want to be able to do deep packet inspection. Pay close attention to the first three in the apps list from their website:

    Unified Threat Management
    Network Security Appliances
    In-line L4-7 deep packet inspection
    Network Monitoring
    Digital Video:
    Video Conferencing
    Video-on-Demand (VoD) Servers
    Video surveillance
    Media 'Head-End' services

    The engineers in charge of this company should be ashamed of themselves. They are creating exactly the type of product that will help
  • I for one (Score:4, Funny)

    by tttonyyy ( 726776 ) on Monday August 20, 2007 @06:36PM (#20298559) Homepage Journal
    I, for one, parallel welcome our new beowulf joke superseding overlords.
    I, for one, parallel welcome our new beowulf joke superseding overlords.
    I, for one, parallel welcome our new beowulf joke superseding overlords.
    I, for one, parallel welcome our new beowulf joke superseding overlords. ... ... ...
    I, for one, parallel welcome our new beowulf joke superseding overlords.
  • A few things jump out just skimming this:

    Is the compiler open-source? Is anyone looking at making GCC do this? What exactly have they done to Linux to make it run on these, and is it likely that the changes will make it into the mainline kernel? Also, they don't seem to mention if they have a C++ compiler.
  • It's not clear who came up with it [unsw.edu.au], but there's an old joke about supercomputers being devices to convert computation-bound problems into I/O-bound problems.

    This chip would almost certainly have the same issue in many applications - how do you get data on and off it fast enough to keep the cores full of data? Do they do anything unique to improve memory bandwidth?

Don't sweat it -- it's only ones and zeros. -- P. Skelly

Working...