Forgot your password?
typodupeerror
Hardware Technology

Clearspeed Makes Tall Claims for Future Chip 254

Posted by michael
from the pound-of-salt dept.
Josuah writes "ClearSpeed Technology announced today a new multithreaded array processor named the CS301. Their press release states the chip can achieve 25Gflops for only 3W of power. New Scientist and TechNewsWorld have articles on this chip, each with more information about the chip. I wondering if this is too good to be true." The key phrase is in the Wired story: "Soon to be in prototype, the chip...". "Soon to be in prototype" is synonymous with "does not exist".
This discussion has been archived. No new comments can be posted.

Clearspeed Makes Tall Claims for Future Chip

Comments Filter:
  • by Ikeya (7401) <[ten.kcuk] [ta] [evad]> on Tuesday October 14, 2003 @03:22PM (#7212184) Homepage
    Today it was announced that Duke Nukem Forever would be optimized to run on the new CS301 processor develpoed by a new firm called ClearSpeed Technology. It is said that with this newfound processing speed, Duke Nukem Forever will be the most realistic game ever realeased.
    • In a similar article, Microsoft released a statement saying they are pushing back the release date of "Longhorn" until the CS301 is ready for home desktop use.
      • "In a similar article, Microsoft released a statement saying they are pushing back the release date of "Longhorn" until the CS301 is ready for home desktop use."

        And, in yet another article, the Mozilla team announced that version 2 of...
    • The game will surely be even better when you run it on the upcoming Bitboys Oy graphics cards!
  • Why would they release a story on something that isn't even in prototype? Seems silly to me. I have plans for a 200GHZ chip, but I still have to make a prototype, film at 11!
    It would be interesting though.
  • "Soon to be in prototype" is synonymous with "does not exist"

    Oh, right on. It's about time someone started developing a mass-market Loch Ness monster.

  • We could put 32 or so of these in a computer and generate the same ammount of heat as, say a Pentium IV but with almost a Terflop of performance? This strikes me as too good to be true...

    • We could put 32 or so of these in a computer and generate the same ammount of heat as, say a Pentium IV but with almost a Terflop of performance? This strikes me as too good to be true...

      Some of the special-purpose GPUs can probably make this claim. "We can do 1TFlop...as long as it consists only of function X." Sun's MAJC was advertised as doing 6GFlops for two cores four years ago, so scaling something similar to 1TFlop today doesn't seem to be totally out of reach (40 CPUs would probably do it).
    • by Spamalamadingdong (323207) on Tuesday October 14, 2003 @03:43PM (#7212459) Homepage Journal
      ... parallel processing units may perform a lot more ops/sec/watt than one single unit. The speed of a processor depends on the time required to charge and discharge the stray capacitances of its connects, and the impedance of its transistors increases as the drive voltage decreases so the RC time constant goes up and the speed goes down. However, the energy required to charge the capacitance scales as voltage squared, so by accepting a hit on the speed (due to the voltage drop) you can do the same calculation with less energy. Clearspeed seems to be taking parallelism to the sub-processor level in order to reduce heat loads; their operations may take longer to complete, but they can do more operations in the same time as long as the code can use the processors in parallel. Thus the emphasis on "multi-threaded", because it wouldn't work otherwise.
      • You'd need a compiler (and an OS, for that matter) that can optimize things into parrallel actions to get any performance out of this thing, then. I wonder if they're planning on having one of those, and if they're going to release it for free or not - they'll probably have a hard time getting any sort of bite into the market without one.
        • So basically any server would benefit greatly as they have multiple processes in execution at one time, even with existing software. But most desktop applications would only be slower without major re-design of entire software to heavily utilize multpile threads all the time and for everything.
  • by psyconaut (228947) on Tuesday October 14, 2003 @03:23PM (#7212202)
    Chips are virtually fabricated and tested well before the first bit of silicon is etched....you can actually be pretty sure of both a chips performance and reliability just from simulations these days. Also, having to etch development chips constantly is both expensive and time consuming....so the longer you can leave a design in virtual space, the better.

    -psy
    • Thing is (Score:3, Insightful)

      by Sycraft-fu (314770)
      You can make theoritical things on a VHDL simulator that you'll never be able to make into actual silicon. The real magic of companies like Intel, IBM, AMD, etc isn't designing an uber powerful chip, it's designing an uber powerful chip that can actually be realizied in silicon, and at a cost that makes it worth selling.

      There has been more than one firm that has suffered from simulator disease. They get all caught up in making an awesome, ass-kicking theoritical design that will eclipse everything and ever
      • Yup, valid points....

        I'm lucky if I can get my VHDL for FPGA and CPLD designs working to start with, let alone create silicon I can't actually pseudo-fab ;-)

        -psy
    • technewsworld had this as their last paragraph. If anything indicates the complete bullshit smell of this announcement, attaching it to a similarly wildyly overhyped fad tech would be it.

      I bet it might hit 25 gigaflops with an "optimized demonstration algorithm" with no cache misses, no branch misses, and heck, all the data is in the registers at all times, so it doesn't even wait for the cache.
      • I'm also more than a little doubtful that this wonder-processor is going to hit anything like 25Gflops when separated from its memory by the PCI bus, let alone when sharing that bus with five of its kind (as suggested in the article).

        Let's see - a 64-bit 66Mhz PCI bus (the maximum the PCI standard allows and, well, not exactly common on PC motherboards, shall we say?) will supply a maximum of 133megawords per second. So a 25Gflop processor will receive one word of data from the system memory every four *h
    • This is not exclusively true. When writing stuff in Verilog for use with Synopsys or Cadence, for example, you don't get to specify quantum effects (though CPU chip designers wouldn't be using something so high-level for most of the design).

      Quantum effects are what you get with such small transistors, and the interactions are a lot harder to predict than is, for instance, the adherence of a transistor to it's response curve.

      You can do an okay job, yes, but that doesn't mean that the thing is going to pos
    • MicroSoft the master of pre-announcing vaporware. Sometimes it eventually does work!
    • Hey psy,

      This is true iff the chip is using standard/existing fabrication tools, processes and development/layout tools. Looking at the articles, it seems like the chip is designed using traditional methods, so except for the "ClearConnect Bus", there doesn't seem to be any ground breaking technology. I would be interested in seeing how a packet based network linking 64, 32 bit processors would be implemented on a standard piece of silicon.

      From this chip's perspective I would like to understand how data
      • Myke...

        Are you questioning the PCI bus's ability to shovel enough data to these chips? ;-)

        (For those who didn't read the WiReD article, the company mentions "PCI supercomputer" cards).

        -psy

        P.S: You still in Toronto?

        • Yes, I would love to see how they expect to do something meaningful with 25GFLOPS while shovelling it through a single PCI bus. I guess you could calculate pi or Napier's constant to whatever decimal place and have enough bandwidth, but I just went through a finite element analysis of a board heating up and I can't believe that you would see a significant jump in performance for an application like this even if you are relying on PCI-X.

          Yessir, still in Toronto, survived the provincial election, prepared t
    • I agree. I would wonder, however, what percent yield they expect to get on these chips.
  • The key phrase is in the Wired story...

    No, the key phrase is this is too good to be true
  • Co processor (Score:1, Insightful)

    by key134 (673907)
    When it comes to market, the chip will likely be sold to consumers as a co-processor -- an add-on PCI card that works in parallel with a PC's main processor

    It's not replacing our current processors. It is just helping them with intensive floating-point calculations. Is that really going to be helpful to the average user? Keith
    • Re:Co processor (Score:3, Insightful)

      by Arker (91948)

      Everything old is new again... eventually.

      From reading the articles, it seems it is indeed designed to be a math coprocessor. Since the Pentium came out, those have been out of style. The Pentium effectively included a 80487 on the same die, and on other architectures that was done even earlier. But now it comes back - only now the idea is a far more powerful coprocessor for scientific functions.

      No, it's not going to be very helpful to the average users. But for those of us that spend a lot of time usin

      • From reading the articles, it seems it is indeed designed to be a math coprocessor.

        ClearSpeed is saying that it can run as a coprocessor, but also standalone. From their press release:

        "The CS301 can serve either as a co-processor alongside an Intel or AMD CPU within a high performance workstation, blade server or cluster configuration, or as a standalone processor for embedded DSP applications like radar pulse compression or image processing."
  • I wondering if this is too good to be true.

    I thinking it is!

  • Skeptical (Score:5, Funny)

    by cybermace5 (446439) <g.ryan@macetech.com> on Tuesday October 14, 2003 @03:25PM (#7212238) Homepage Journal
    As well as the fact that I've seen this press release trolled by AC's on Slashdot.

    25Gflops on 3W? That must be some unorthodox technology at work there. Anyone hear anything about some research corporation finding an amazing processor in a robot from the future?
    • well..

      if you have only one possible flop(floating point op? i dunno, i've never bothered to check on these usually fairy tale figures beyond bogomips).

      for example, you can only add 0.001 to 0.001, but you can do that 25 000 000 000 times a second..
    • That must be some unorthodox technology at work there. Anyone hear anything about some research corporation finding an amazing processor in a robot from the future?

      Don't be silly, that's just a movie. It was found at Roswell.

      (or the MiB need more budget - oops I'm guilty too)
    • There is no magic here. This processor will not run general-purpose code very well, but it will scream on regular, repetetive code with very predictable memory access patterns. DSP kernels, some types of encryption algorithms, blah blah blah...

      It's a very power efficient way to run these kinds of applications.

      Do a google search on the Stanford Imagine project for some academic background.
    • Sitting somewhere in infinite isolation, Marvin the Robot sits and sighs in abject misery. He ponders the loss of his right arm; parts from it used to spit out nothing but 1's and 0's in a small beige box. 1's and 0's, 0's and 1's. Marvin let's out a small mechanical sigh of solitude and begins counting backwards from infinity to 0, in binary.
  • Only $16,000! I'll take two!
    But where's the desktop bus bandwidth supposed to come from? I Think it'll choke on my PC133 RAM. Whatever desktop machine they're targeting is what I want for Christmas.
    • Here's my favorite line from the article:

      Putting around 20 ClearSpeed chips into a few personal computers could potentially provide the sort of power normally only found in a supercomputer built from hundreds of parallel processors or specialised hardware.

      Yea, that's right. A $16,000 comodity processor.
  • by mblase (200735) on Tuesday October 14, 2003 @03:27PM (#7212260)
    I'm reminded of all the promises we heard for the Transmeta chip, only a fraction of which are being realized. And they have an actual product to demonstrate, mind you.

    Yeah, it sounds like wishful thinking. I have little faith in processors from unknown companies that claim to do what Intel, AMD and IBM combined haven't yet been able to achieve.
    • I'm reminded of all the promises we heard for the Transmeta chip, only a fraction of which are being realized. And they have an actual product to demonstrate, mind you.

      Really? From what I've read they delivered on everything they said they would/could do. What didn't they deliver on and where was it said they did something that didn't materialize?

      I'm not trolling here, I just curious.
    • As for unknown companies doing stuff that IBM, Intel, and AMD aren't...Xilinx anybody? Check out Star Bridge Systems and their computer at NASA's Langeley Research Facility.

      I've yet to see a Transmeta box in action, but I know their chips were in some of the first blade systems a year or two ahead of HP's or any other major manufacturers. You definitely can't deny that they've been successful.

      Maybe I'm a tech optimist, but I'd be willing to put money on Clearspeed's technology. It sounds cool to bo
  • Way back when, when I was reading that classic crytographic book whose name I can't remember by that guy whose name I also can't remember, he was saying that a 256 bit symmetric key would be practically unbreakable since you'd need the total energy output of the Sun for a year to make that many phase changes in the computer.

    So, in that kind of light, can anybody here with thermodynamic knowledge compare the total number of phase changes required for this speed versus the energy which has been claimed it n

  • And its still an article?

    Slow news day I guess...
  • ....The announcement might be describing vaporware but 3W / 25 Gflops isnt too amazing to definitely indicate vaporware. ARM VFP9-S [convergenc...otions.com] co-processor is about 0.4 Gflops for about 0.8 watt (about 1.5 gflops for 3 watt). Keep in mind that it was introduced in 2001. 4 years and 15 fold improvement seems possible.....
  • ... best case, and 128 K of cache.

    Unless this thing is working on highly specialized data sets, it doesn't matter how much data the core can mow through if it can't get the data fast enough. Why do you think AMD and Intel are so obsessed with their memory interfaces? There's little difference between the Athlon and the Athlon 64 besides large data width and fancy memory / SMP interfaces.
  • I don't think it's completely is synonymous.

    Knight Rider, a shadowy flight into the dangerous world of a man who will soon be in prototype.

  • Maspar (Score:5, Interesting)

    by hobit (253905) on Tuesday October 14, 2003 @03:40PM (#7212413)
    For the last 10 years or so I've been thinking about how to do just this. What I'm 99% sure they are doing is SIMD on a massive scale. The Maspar (and especially the Maspar-2) were computers along this line.

    The basic idea is to have lots of "processing elements" that are basically ALUs with a bit of additional smarts (for branches mainly). Each PE has its own memory. The main processor (probably not the PC CPU) tells each PE what to do. Thus the Single Instruction Multiple Data. Things are a bit more complex then this (branches, pointers, and a few other things cause some problems.) but not too much worse. PE to PE communication is also interesting (the Maspar was a toroid as I recall).

    The two basic problems with this type of a design are:

    • You either need a special programming language (and someone who understands the language and understands the problem really well) or a very very good compiler to get anything out of it.
    • The application range is quite limited. Not as limited as supercomputer people seem to think (I mean I've written genetic algorithm code for the Maspar that scales wonderfully.) but still quite limited.

    There are also a huge number of other problems. Caches don't generally do a darn thing for massive SIMD computers (if one processing element misses, they all do.) The memory usually has two types of pointers (one to the PE memory and one to global memory). I may contact the company to see if they want to hire a short-term consultant. hummm.... Have PhD will travel?

  • unfortunately for them, the proof is too big for them to fit in this margin...
  • Aren't these used in the Phantom Game console?
  • by onyxruby (118189) <onyxruby@@@comcast...net> on Tuesday October 14, 2003 @03:48PM (#7212531)
    Onyxruby's law:

    The amount of hype per inch produced by marketing doubles every 18 months.

    With apologies to Moore ;) /me reminded of when apple tried claiming the imac as supercomputer.
    • Applie didn't try claiming the iMac to be a supercomputer that I remember. They did claim that for the PowerMac G4 at 500MHz, their claim was a super computer was > 1GFLOP. That performance rating was not too unrealistic as Alpha chips two years before that clocked around 0.95 GFlops without SIMD, it isn't hard to exceed that with SIMD.
  • The chip will have 64 threads of execution, which means that each thread only needs to deliver about 400 MFLOPS. Since a standard floating point instruction has a latency (from issue to retire) of about 5 or 6 cycles, this is easily achievable in current technology (2-2.5 GHz system clock) without even using pipelining. If the thread units are pipelined, you can expect the clock to be in 400-800 MHz range.

    When they have a device that delivers 200 GFLOPS with 64 threads, then I'll be interested.

    • According to this presentation [clearspeed.com], it runs at 200MHz. It's refreshing to see someone taking this approach, rather than insane clock frequency/power dissipation. I'll be impressed, though, if real application software can use it efficiently.
  • Currently, the main computational bottleneck is memory speed & bandwidth. Processors - it's still relatively easy to stay on Moore's curve. But memory speed improves by only a couple of percents per year. Yes, you can throw caches at them (it's not uncommon these days to have 6M or even 8M on a server chip), but those caches are very unlikely to consume 3W ...

    You can certainly throw a bunch of ALUs on a grid (it's not so difficult) and claim GIPS, GFLOPS or whatever ... but you won't get similar spee

  • Getting high performance out of a chip really isn't that difficult (I know I'm understating a lot of the real knowledge underneath); however, the trick is doing it reliably. An Intel or AMD processor must be able change from a wide variety of states (fixed to floating to OS commands) and be able to recover from any invalid state, so a lot of the chip is tied up in ensuring consistant operations. As I gather, they're just basically making an optimized floating point coprocessor (can you say 387? I knew yo
  • It should read: "Clearspeed Makes Tall Claims for Fictional Chip"
  • From ClearSpeed's website [clearspeed.com]

    HPEC 2003

    Lexington, MA
    September, 2003

    Lockheed-Martin and Worldscape Defense presented the results of their work using ClearSpeed's processing solutions.

    They benchmarked FFT and pulse compression algorithms and found between 20 and 30 times improvement in performance per watt against competitive solutions.

    That page also has a PDF of their presentation at the 2003 Microprocessor Forum. Whether this technology will pan out is a matter for the markets, but ClearSpeed isn't loo

  • My dad is the smartest person in the world.
  • by supabeast! (84658)
    Does anyone know if this company employees the same marketing/PR firm that handled the BitBoys?
  • It's certainly feasible to build a machine with 64 FPUs that can operate in parallel. Finding an application for it is tougher.

    Getting data in and out fast enough to feed the thing will be a problem. It will probably only achieve its rated speed when it's working intensively on small data sets. That's a typical DSP application. This might be a useful part for a software radio. They mention radar applications, which are basically software radios.

    That ratio of MFLOPS/watt would help for graphics proce

  • I'm sure that cyberdyne chip is working out well for them... But what are they going to do with the arm? Juggling? Labyrinth-esque sphere stuff? I kinda shudder to think...
  • Plausible (Score:3, Interesting)

    by saha (615847) on Tuesday October 14, 2003 @04:14PM (#7212778)
    Clearspeed formerly known as Pixelfusion was a promising graphics chip company that developed these scalable SIMD processors a few years ago. They put 24Mbits of RAM directly on to the chip, to have the enormous memory bandwidth that was and still is unheard of in the industry. After the industry attention shifted towards Nvidia, ATI, 3DLabs the board of directors reorganized the company to focus on high speed network switchers and routers.

    Some of the hardware design came from from engineers in Bristol, UK. Companies like Division and INMOS (anyone remember the T800 and T9000 transputer and a Microway board for parallel computing on a PC board more than a decade ago?). The other half of the design team came from UNC computer graphics lab in Chapel Hill. From the well known PixelFlow and PixelPlane machines. That along with a Taiwanese fab plant that would produce these SIMD processors with extra PE (SIMD Processor Engines) that would compensate for the manufacturing errors. eg. Lets say the chip would have 100 PEs so they would manufacture it 120 PEs. Those that didn't work they'd switch off and they wouldn't have to throw away the entire chip.

    The story of PixelFusion was unfortunate. They could have rocked the computer graphics world with their scalable tile based rendering technology and efficient manufacturing methods. The programmable PEs would be able to handle both Direct X and Open GL. I suppose now they are trying to focus their investment and IP into more generic applications. I find their claims to be plausible because they have demonstrated innovative chips in the past.

    My 2 cents

  • The chip will have 64 parallel FPU's. If it can complete one floating point operation per cycle, it will only need to run at about 350 to 400Mhz to reach 25GFLOPS (latency and pipeline issues aside, of course). Even if it requires 2 clock cycles, or the first 32 FPU's feed the second, we're talking about 700 to 800Mhz.

    I'm not certain, but I thought I ran across similar number crunching capabilities in Integer OPS. It seems to me to have been in regards to fibre fabric and switching.

    Or I could be on cra
  • Since you can get 6 GFLOP in a conventional x86 compatible CPU, why go to incompatible technology for a 4X speed improvement?
  • I wonder how many people work for Slashdot and own shares in Transmeta, which is coming out with the TM8000 right now, and is announcing earnings tomorrow. Full disclosure: I own Transmeta shares too.

    Now, usually Slashdot greets these RSN products with glee and neglects to mention that they are vapor. Not this time, nosiree. Why? Because if it were true it would compete with Transmeta.

    Not accusing anybody of anything wrong here... just... well... I've drawn my conclusions. You draw yours.

  • In large-scale supercomputing applications, if the $16,500 holds, you're still better off with VT's G5 supercomputer as it's less than half the price of a comparably speedy machine based on these chips.

    I don't have a question on how these chips will plug in. Most likely their card will contain 2-8 of these chips, plus a controller and specialized RAM, all interconnected by their proprietary bus (mentioned in the press release). It will do a large chunk of the processing in isolation from the CPU and oth

  • This chip sounds like a big parallel DSP. All those transistors on a Pentium 4 that go into the virtual memory system or the branch prediction or out of order pipeline juggling, in the DSP are dedicated to number crunching. I don't know how much the crunch power of this chip exceeds those of a current high end graphics chip (NVidia, etc). but it's probably not that big a ratio. The graphics chip also beats the heck out of a Pentium 4 in raw parallel arithmetic speed. The graphic chip is of course very s
  • Sounds to me like a Blitter chip! I better check and see if this'll work with the TOS 2.5 ROM upgrade for my Atari 1040ST! Bonus points if it'll plug into the cartridge slot! I can't wait for the ST to beat the G5 using Cubase Audio now... :)

  • Premature... (Score:3, Interesting)

    by gweihir (88907) on Tuesday October 14, 2003 @05:38PM (#7213350)
    Without a working prototype they have nothing.

    With a working prototype they still have not much.

    With a working, and cost-efficient manufacturing process, they have something.

    When there are compilers that actually can use this kind of thing, it starts to be somthing that is real.

    My guess is they are about a decade from a reliable, usable and cheap product. Suddenly these numbers do not sound impressive at all...
  • by illumin8 (148082) on Tuesday October 14, 2003 @05:46PM (#7213414) Journal
    If I understand the article correctly, it looks like they're implementing a much more powerful version of Apple's Altivec SIMD technology. My question is, if computing power increases 500x using this technology, doesn't memory bandwidth and system bus speed have to increase exponentially as well just to realize any gains?

    It seems like putting one of these cards in a PC with today's technology would be like sticking a mainframe behind a 300 baud connection: sure it can handle millions of transactions a second, but you'll never actually see that kind of throughput because memory is so slow.
  • by TexVex (669445) on Tuesday October 14, 2003 @06:42PM (#7213824)
    So I'll summarize some interesting key points:

    1. The chip is fully programmable and an SDK invluding C compiler is available now.
    2. The chip will be marketed as a coprocessor.
    3. They expect to start selling them for around $16,000 in a few months.
  • Actually, this isn't terribly surprising if you look at the specs. Its a vector processor with 64 processing elements. Each PE has an FPU. The 25 gigaflop theoretical rating probably comes from FPUs * Clock_Speed, so the thing probably runs about 400 MHz. You have to understand that this isn't a general purpose processor --- you just send it some numbers to crunch, and it sends numbers back to you.
  • For whatever reason I was having problems downloading the slide show from my home computer, though I had no problem from work. I've mirrored [umich.edu] it if anyone wants to look at it.

    It is a SIMD machine. It looks like they've put some real thought into the software, which is the hard part in something like this. The debugger certainly looks pretty.

    The ported C code on slide 13 is a bit scary. The intermediate language appears to rely on the compiler to distribute the workload to the PEs (otherwise why is t

The only difference between a car salesman and a computer salesman is that the car salesman knows he's lying.

Working...