Stretch Announces Chip That Rewires Itself On The Fly - Slashdot

Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

×

Stretch Announces Chip That Rewires Itself On The Fly 311

Posted by simoniker on Monday April 26, 2004 @03:08PM from the self-reconfiguring-articles-next dept.

tigre writes "CNET News reports on a chip startup call Stretch which produces the S5000, a RISC processor with electronically programmable hardware so that it can add to its instruction set as it deems necessary. Thus it can re-configure itself to behave like a DSP, or a (digital) ASIC, and perform the equivalent of hundreds of instructions in one cycle. Great way to bridge the gap between general-purpose computing and ASICs."

This discussion has been archived. No new comments can be posted.

Stretch Announces Chip That Rewires Itself On The Fly

Search 311 Comments Log In/Create an Account

Comments Filter:

more info (Score:5, Informative)

by morcheeba ( 260908 ) * writes: on Monday April 26, 2004 @03:12PM (#8975142) Journal

NetworkZone has a product review [analogzone.com] with some more insight. A good quote:
...the [300 MHz] Stretch even beats the Intrinsity FastMath processor running at 2 GHz

Of course, there is no such thing as a universal solution and the Stretch processor does have its limits. One significant area is in "low touch" operations such as network processors. While it can certainly do the relatively simple packet inspection and transformation that switch fabrics and network processors normally handle, it is really much better suited to the heavy-duty calculation- and manipulation-intensive tasks found in "high touch" applications such as video compression. For example, H.263/264 motion estimation is capable of producing very high-quality video from a relatively small bit stream, but requires lots (and lots) of raw processing horsepower. Happily, the Stretch processor is only too happy to oblige, churning out a SAD (sum-absolute difference) operation on a tile-full of pixels for H.263 video in 43 ns (H.264 takes 83 ns).

Share
twitter facebook
Not really new technology (Score:5, Informative)

by stephenry ( 648792 ) writes: on Monday April 26, 2004 @03:15PM (#8975173)

It's called DISC, Dynamically Reconfigurable-Set Computer. It's existed for a few years now. If I remember correctly, there is a group at Berkley working in the area and have released a few nice papers on it.

Share
twitter facebook
PLD's have been around for years. (Score:2, Informative)

by dispater124 ( 725880 ) writes: on Monday April 26, 2004 @03:22PM (#8975250)

The concept of a programmable hardware device isn't all that new. And the encoding and encryption they talk about speeding up is a typical application of PLD's. High end routers use similar devices to optimize their tables etc. Kuro5shin has a nice article for beginners. http://www.kuro5hin.org/story/2004/2/27/213254/152

Share
twitter facebook
Re:How is it possible? (Score:3, Informative)

by Professr3 ( 670356 ) writes: on Monday April 26, 2004 @03:22PM (#8975255)

Say you had to compute a 10000-entry sin/cos table (simple example). The processor would reconfigure itself to perform sin/cos operations in a single cycle (parallel ALUs etc.) and, if there were enough configurable circuits, perhaps multiple sin/cos table entries at once. That's where the speed advantage is - large blocks of repetitious calculations. With a sophisticated enough reprogramming AI, computationally intensive apps like video games could get a huge performance boost.

Parent Share
twitter facebook
Re:How is it possible? (Score:3, Informative)

by Chirs ( 87576 ) writes: on Monday April 26, 2004 @03:23PM (#8975259)

You hit upon the answer in the latter portion of your post. Most cpus are generalists--they're fast at most things, but aren't optimized for anything. This kind of tech allows you to optimize your cpu for a particular task.

If you have something that needs to do a simple operation on each member of a large data set, the chip could be configured as many tiny simple cores that are just smart enough to do that operation.

Or if you needed to do a complicated math function, you could optimize the cpu for that function.

Of course, it takes a certain amount of time to do the reconfiguration, so it may only pay off for many repetitions or very complex calculations.

Parent Share
twitter facebook
Re:Can someone explain? (Score:5, Informative)

by DaHat ( 247651 ) writes: on Monday April 26, 2004 @03:24PM (#8975274)

For the most part, FPGA's you build its code from scratch, you give it it's identity of how it works, what it does and so on.

This chip sounds like a hybrid between an FPGA and a run of the mill general purpose RISC processor. Being based on a RISC instruction set, you code for it as you would a normal processor, however if the compiler sees code which could take advantage of having more CPU support, it could add instructions to the FPGA like portion of the chip to enable better throughput.

The short summery is: FPGA, programmed from scratch. Standard RISC processor: Already has instruction set which you program against.

This could be quite handy for some of the embedded programming I do.

Parent Share
twitter facebook
Not too different from what's already available... (Score:5, Informative)

by stienman ( 51024 ) writes: <adavis&ubasics,com> on Monday April 26, 2004 @03:24PM (#8975278) Homepage Journal

This is evolutionary, not revolutionary. Many chipmakers have offered microcontrollers and microprocessors with FPGA on chip. Often it is an extension of the I/O built into the processor, so it's not much different than an external FPGA on the processor bus. Please note that this is NOT like processors that run on the FPGA itself - these are seperate from the FPGA portion of the chip.

Stretch is different in a few ways:
It pulls the FPGA closer to the core, so that it can be utilized almost as part of the pipeline. I say almost because of the following statement in the article:
Inside the chip, the ISEF is coupled to the rest of the circuit by 128-bit buses and has 32 128-bit registers. It runs in parallel with other areas of the processor, effectively becoming a fully reconfigurable co-processor, and can be reprogrammed for new instructions at any time during operation.

So it's still fairly seperate from the processor core.

But the core itself is high performance (fast clock, a little faster than the average FPGA) and it has a very fast memory bus (again faster than the average FPGA)

The downsides are likely to be:
1) Power cost and dissipation. Since it's a slow clock, the dissipation probably won't be bad, but it's not going into a small portable machine.
2) Time to reconfigure. This isn't meant to be a general processor with task switching. Context and task switching is going to be expensive and if you plan on running two concurrent tasks which both require special instructions the entire processor will likely perform, on average, much worse than it would without the reconfigurable portion. Unless, of course, the processes were created to use the same set of special instructions so the context switch isn't more expesnsive than it is for today's processors.

So they are targetting it correctly, it seems. Specialized areas with, in general, only one task/program running at a time. Multimedia players, for example, would be great here. A digital recorder/player would work well if both the encoding and decoding portions of the code were compiled so the special instructions created wouldn't have to be changed for either application to allow playback while recording.

-Adam

Share
twitter facebook
Re:so does that mean... (Score:5, Informative)

by tuffy ( 10202 ) writes: on Monday April 26, 2004 @03:24PM (#8975281) Homepage Journal

we can have only one standard assembly language?

That's already here. It's called "C".

Parent Share
twitter facebook
Re:How is it possible? (Score:3, Informative)

by radish ( 98371 ) writes: on Monday April 26, 2004 @03:26PM (#8975296) Homepage

I studied "Custom Computing" as it was called at my university a few years ago. That was based around using FPGAs as the processor, but with the same idea of doing on-the-fly redesign of your hardware to suit the current problem.

The basic idea is to move problems from the time space (i.e. do X then Y then Z taking T time to do it) to the physical space (i.e. do X next to Y next to Z taking S transistors to do so, but only one cycle). So your simple add operation in a regular microprocessor, which fetches the data and runs them through a generic arithmetic unit before putting the result back somewhere would instead have the load, add and store circuitry "hard coded" in actual transistors.

It takes some serious mental acrobatics for a programmer like me, which probably led to my not-so-stellar performance in that class ;) But it sure is interesting.

Parent Share
twitter facebook
Re:How is it possible? (Score:3, Informative)

by the morgawr ( 670303 ) writes: on Monday April 26, 2004 @03:27PM (#8975304) Homepage Journal

It's a DSP/RISC processor (basically the same thing) with an on-chip FPGA. If you have some particular algorithm, you can put it on the FPGA to get a solution instead of having to use code. (this is a lot harder to explain then I thought it would be....)

Parent Share
twitter facebook
The first processor that can? (Score:5, Informative)

by mrplado ( 736237 ) * writes: on Monday April 26, 2004 @03:27PM (#8975305) Homepage

The first processor that can add to its instruction set while operating? I think there were a few microprogrammed processors in the 70s/80s with writable control store that could do exactly that. Anybody remember PERQ workstations? Now this new gadget appears to be able to extend itself by means of an embedded FPGA, instead of plain old microcode, so it's a bit like the Xilinx Virtex II PRO series (PowerPC core with big FPGA on one chip). The really innovative thing is that you don't have to program the FPGA in VHDL or Verilog, but the C++ compiler takes care of that.

Share
twitter facebook
Well... (Score:4, Informative)

by Ayanami Rei ( 621112 ) * writes: <rayanami&gmail,com> on Monday April 26, 2004 @03:27PM (#8975309) Journal

This is basically an FPGA married to a RISC processor. So if you have a bit of RISC code that can be simulated using the FPGA portion, and you have enough spare cells to add it, and it takes 10 clock cycles for the FPGA "user instruction" to dispatch, but it takes 200 to do it outright in the original RISC instructions, then you're experiencing a 20 to 1 speed increase for that bit. You speed up the function without overclocking. Actually what you've done is "trade off".

He could have posted clearer, if he wasn't trying for first post.

Parent Share
twitter facebook
Better article on EETimes (Score:5, Informative)

by apirkle ( 40268 ) writes: on Monday April 26, 2004 @03:28PM (#8975320)

There is a much, much better article with lots more detail on EETimes.com [eetimes.com].

Share
twitter facebook
even more info (Score:2, Informative)

by Anonymous Coward writes: on Monday April 26, 2004 @03:32PM (#8975363)

EE Times has an article here [eetimes.com]. Apparently this chip has a competitor. There's also more details about the chip itself.

(Anonymous because logging in at work)

Parent Share
twitter facebook
Re:Can someone explain? (Score:3, Informative)

by falzer ( 224563 ) writes: on Monday April 26, 2004 @03:32PM (#8975370)

FPGA in this context means Field Programmable Gate Array.

Parent Share
twitter facebook
Re:Can someone explain? (Score:2, Informative)

by pelgv ( 714539 ) writes: on Monday April 26, 2004 @03:35PM (#8975396)

FPGA stands for Field Programable Gate Array... and it is a Chip that can be Programed, and Re-Programed... The programations is a low level one... even lower than Micros... you design it for electrical connection between gates...

dunno where u got that definition...

Parent Share
twitter facebook
Re:hahahahaha ... Worst Math Ever (Score:3, Informative)

by claar ( 126368 ) writes: on Monday April 26, 2004 @03:39PM (#8975435)

Well, even if his math was wrong, his point is still valid.. going from 5 trillion years to 5 billion years isn't much different (of course, even 128 bit encryption is currently thought to take much longer [avolio.com] than a measly 5 trillion years to brute force).

Most cryptology systems are purposefully designed to take an absolutely absurd amount of time to crack -- exactly to account for many of these instant 1000 fold improvements.

Parent Share
twitter facebook
This != New (Score:2, Informative)

by sam_van ( 602963 ) writes: on Monday April 26, 2004 @03:40PM (#8975455) Homepage

I've noticed some folks comparing this to Transmeta. While similar, there are a few more comparable architectures out there.
Perhaps the most notable (in its conception, at least) was Seymour Cray's attempt at a Pentium Pro core + reprogrammable extensions (via FPGA or the like) at his post-Cray Research company. More recently, IBM licensed PowerPC cores for use by Xilinx. Up to four of those cores get thrown on the die with a Virtex-II FPGA (?); each of the cores has the ability to add opcodes in FPGA land.
Even more recently was my last company's valiant effort at something similar (and even cooler). RIP, SiliconMobius.

Share
twitter facebook
FPGAs and the rest of the acronym zoo. (Score:5, Informative)

by Christopher Thomas ( 11717 ) writes: on Monday April 26, 2004 @03:43PM (#8975484)
How is this different from FPGA's?

Short answer: FPGAs let you build using basic gates and (very small) lookup tables. This lets you build anything you please, and fully optimize the number of functional units of each type that you have, but has a speed and size penalty.

This chip is basically a RISC processor with an FPGA-type fabric bolted on as a co-processor, as far as I can tell from the detail-poor press release. By implementing most of the instruction pipeline as fixed, optimized hardware, it runs without any of the penalties of a purely FPGA-based implementation. When you have a number-crunching task that would benefit from a custom logic implementation enough to offset the performance penalty of implementing it in programmable logic blocks, the compiler configures the programmable logic into a suitable coprocessor which is stuck in as an extra branch of the instruction pipeline.

How much benefit you get from this depends on what you're doing. Modern general-purpose microprocessors have enough vector instructions to handle most DSP-ish tasks without an abysmal speed penalty (just a large size and power penalty over a purely DSP-based implementation). Most computing tasks aren't limited by processing horsepower at all - they're either waiting for memory accesses to complete (even cache accesses are very slow compared to register accesses), or they're waiting for the target address of a branch to be decided (speculation and BTBs don't address this perfectly by a long shot). A reconfigurable processor would suffer from much the same type of problem. While using the programmable logic path for slice processing could remove some of the branching penalties (by following all paths and selecting the desired result), this would be at an even greater area and power cost.

For specialized applications, it would be quite useful, of course.

A quick glossary of terms being thrown around, for anyone confused:
- FPGA - Field Programmable Gate Array.
  This is a combination of lookup tables, sum-of-products combinational logic blocks, and scratch-pad SRAM that you can hook up in nearly arbitrary ways to produce custom circuits at a gate level. Bulky and slow, but good at implementing algorithms efficiently. Configuration information is loaded from a serial PROM chip at startup, letting you change it relatively easily.
- CPLD - Complex Programmable Logic Device.
  Like an FPGA, but stores configuration information internally, so you need to take out the CPLD and burn it to change configuration instead of re-burning the configuration PROM.
- PLA/PLD - Programmable Logic Array/Device.
  Little cousin to CPLD. This is what you played with in second or third year. Typically these are just a sum-of-products combinational logic block with a register stuck on the end to latch the output. Useful as glue logic.
- ASIC - Application-Specific Integrated Circuit.
  This is an integrated circuit that's half-made. A number of gates and registers and so forth have been fabricated on the chip, and the lowest few metal layers have been used for internal routing for these, but you get to define the upper metal layers to form arbitrary connections among these (either as the last fabrication step, or by laser-cutting a pre-fabricated wiring mesh to leave the geometry you want). Works much like a CPLD, but the design is decided at fabrication time and cannot be changed. Faster and less bulky than a CPLD implementation.
- Standard cell design.
  This is a custom-fabricated integrated circuit that uses cells from a standard library of components, usually automatically placed and routed from a VHDL or Verilog description of what you want the chip to do. Faster than an ASIC if you have good place and route software, but more expensive in small quantities because you're making what amounts to a full custom chip. Design time is much less than a fully custom design would be, though (but verifying that the design description is correct is a royal pain).
I hope this clears things up for anyone who was confused.
Read the rest of this comment...
Parent Share
twitter facebook
Re:errmm... (Score:2, Informative)

by fitten ( 521191 ) writes: on Monday April 26, 2004 @03:44PM (#8975492)

When a function is defined in code, you have to use multiple processor cycles to complete the function. However, when the funciton is "on the chip", that entire function can be completed in just one assembly-level call to the processor.

But you cannot say that one "assembly level call" to the processor will take (even) fewer "processor cycles" to complete. Hint: very few instructions in even today's CPUs take a single clock cycle to execute, most take several, it's just with pipelining, many instructions have a retirement rate of one (or more) per-clock.

This isn't a silver bullet. In fact, the big deal about this thing is that it combines an FPGA and the processor onto a single chip. Before this, you'd write it all and implement it on a single FPGA, where it would be generally slow/simple for the general purpose part or you'd use an FPGA as a co-processor and feed it with a host CPU.

Parent Share
twitter facebook
Re:Woooo (Score:3, Informative)

by narcc ( 412956 ) writes: on Monday April 26, 2004 @03:54PM (#8975598) Journal

Some [wikipedia.org] more [solarbotics.net] information [lmsm.info]

Parent Share
twitter facebook
Re:That reminds me of... (Score:4, Informative)

by jcorgan ( 30025 ) writes: on Monday April 26, 2004 @03:59PM (#8975667)

This was Adrian Thompson's [susx.ac.uk] doctoral thesis in 1996.
He used a Xilinx FPGA and a genetic algorithm (implemented separately) to evolve a circuit which could distinguish (IIRC) two different frequency tones on the input as a logic level output. The "program" was allowed to interconnect the FPGA configurable logic blocks in any old sort of way internally and between CLBs. This would include ways which would cause logic designers to shudder in horror :), and did not include a clock input to the circuit at all.
The result was a successful circuit that used a relatively small portion of the FPGA. But trying to work out how it was accomplished the tone discrimination was impossible. There were sub-circuits that were isolated from the rest of the circuit but when removed would cause the circuit to fail. Thompson hypothesized that the circuits were taking advantage of "out of band" communication via electromagnetic or thermal influences on adjacent CLBs.
Furthermore, the circuits turned out to be very specific to the ambient temperature during training and usage, as well as being specific to a particular FPGA used (a working circuit on one would fail on another.)
In any case it was a fascinating small-scale exploration of what reconfigurable hardware and genetic algorithms could accomplish, when not constrained by the "clock driven sequential logic" paradigm nearly all human engineered circuits use.

Parent Share
twitter facebook
Re:so does that mean... (Score:3, Informative)

by sketerpot ( 454020 ) writes: <sketerpot&gmail,com> on Monday April 26, 2004 @05:03PM (#8976464)

There's a cool library called GNU Lightning [gnu.org] which will generate machine code at runtime, which is good for JITs and such. It isn't exactly what you're looking for, but it illustrates that having a standard assembly language (or, much more likely, several standard assembly languages!) isn't all that far off.

Parent Share
twitter facebook
Re:How is it possible? (Score:3, Informative)

by Zordak ( 123132 ) writes: on Monday April 26, 2004 @05:08PM (#8976523) Homepage Journal

There is the analysis required to even determine that the incoming instructions require sin/cos. Then there has to be a lookup into a rule table for how to rewrite the gates to optimize for this. Then that rule needs to be applied. You have to be able to show me that this can all be done faster and cheaper than a x86 at 4Ghz just ramming it through. Maybe it can, but I am skeptical.
You are making the assumption that all of this is done on the fly. It's not. The compiler would, at compile time, locate candidates for hardware optimization, or the programmer would specify them explicitly. Also, it wouldn't use a "lookup table." It would basically be Verilog or VHDL, which would compile into netlists, which are placed and routed, all as part of the build process. So, the compiled program includes instructions to reconfigure the dynamic portion of the processor. Sure, each reconfiguration has some overhead attached to it, but remember that computers excel at repetitive tasks. You configure, for example, a Laplace transform circuit once, and use it multiple times throughout your program. Since the configurable portion has enough space to handle a number of special instructions, you put your heaviest, most-used instructions in hardware, and you are now doing complex transforms in a handful of cycles instead of hundreds (or more). Remember that executing an instruction in hardware is orders of magnitude faster than doing it in software. So, for sufficiently complex operations, you could realize huge, huge performance gains, even if you had to reconfigure the dynamic instruction every single time. I attended school at a place where some grad students were doing research into this very technology, and although I was a freshman at the time, I knew enough to understand how they could claim significant speed gains.

Parent Share
twitter facebook
Two companies announced similar products today (Score:3, Informative)

by gupg ( 58086 ) writes: on Monday April 26, 2004 @05:28PM (#8976771) Homepage

It seems Stretch is not the only company that announced such a product today: EE Times article [eetimes.com].
Also, keep in mind, customizable ISAs have been around for a while -- in Tensilica and ARC processors. These guys do it dynamically.

Share
twitter facebook
Re:Sounds good on paper, but... (Score:3, Informative)

by exp(pi*sqrt(163)) ( 613870 ) writes: on Monday April 26, 2004 @06:18PM (#8977359) Journal

You have OS support. New instructions are a resource that the OS manages. Too many processes want to add their own instructions? Then when a context switch takes place the OS overwrites instructions for the outgoing context with instructions for the new one. Same as managing small amounts of RAM by swapping.

Parent Share
twitter facebook
processor + logic (Score:2, Informative)

by period3 ( 94751 ) writes: on Monday April 26, 2004 @07:55PM (#8978228)

Though not the same as this, the Xilinx Vertex II Pro [xilinx.com] combines an FPGA and PowerPC risc core on the same chip.

The Altera Excalibur [altera.com] does something similar with an ARM processor core and programmable logic.

Both of these have been around for a while...

Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Related Links Top of the: day, week, month.

613 commentsIs the Obsession with EV Range All Wrong?
463 commentsElon Musk Predicts Electricity Shortage in Two Years
459 commentsIs 8GB of RAM Enough For a Mac?
426 commentsWhat's the Solution to Gridlocked EV Chargers?
418 commentsWhy EVs Won't Crash the Electric Grid

If you want to put yourself on the map, publish your own map.