Stretch Announces Chip That Rewires Itself On The Fly 311
tigre writes "CNET News reports on a chip startup call Stretch which produces the S5000, a RISC processor with electronically programmable hardware so that it can add to its instruction set as it deems necessary. Thus it can re-configure itself to behave like a DSP, or a (digital) ASIC, and perform the equivalent of hundreds of instructions in one cycle. Great way to bridge the gap between general-purpose computing and ASICs."
virus hitting the hardware (Score:5, Insightful)
Re:virus hitting the hardware (Score:2, Funny)
Uh, no.
Re:virus hitting the hardware (Score:5, Interesting)
Forgive my ignorance, but why would this be any different than the virus you can write with the general purpose CPUs we have today? You could make the machine unreliable, but that wouldn't make for an effective virus distributing machine.
Re:virus hitting the hardware (Score:3, Funny)
10,000,000 Windows machines can't be wrong!
Insightful?! (Score:3, Funny)
Re:Insightful?! (Score:2)
Re:Insightful?! (Score:5, Interesting)
I highly doubt anyone is planning on making PCs with these. They are designed for being a processor in something like a data logging / control system, surveillance video compression, etc. Your system will probably have no need for virus detection any more specific than other more general regression and test suites it will need during operation.
Re:Insightful?! (Score:3, Insightful)
stop the madness. (Score:3, Insightful)
The same way you detect a virus on any machine that has been compromised, with another machine and or a thorough understanding of normal operation and running processes. Nothing new here. Evaluate the harm done by a potential compromise and take steps accordingly.
There is no practical difference between a hardware and a software compromise and the remedy is the same. Indeed, for critical purposes, there's little difference b
Re:virus hitting the hardware (Score:2)
Re:virus hitting the hardware (Score:3, Insightful)
A Minor change in the instruction set would likely render the OS dysfunctional - and while that would certainly get attention - it would not propogate very well.
There is a math about viruses which requires them not to kill their hosts, and to do as little damage really as they can bear. Damaging viruses get high priority on fix lists and would get shut down more quickly than less harmful viruses.
I think a CPU change virus would be a rather self-defeating proposition.
Re:virus hitting the hardware (Score:4, Insightful)
People developing along similar lines must have means of controlling the new circuitry so that hot spots don't form on the die. Especially if they provide analog capability. It could be too easy to set up a feedback that could really trash that part of the die.
Which brings up another thought: Do they have an on-board controller that tracks what parts of the die are usable and what aren't? If they do, they can have seriously high production yields.
In fact, I wouldn't be surprised if such a self-diagnostic utility made its way into modular dies with specialized circuitry. So a processor could run on two AMUs instead of three, and so forth.
New application-speed records to be set... (Score:5, Insightful)
Effective application speed was never based on a cycle count alone, because different processors can have better instruction sets for the given application. The main breakthrough here is that this chip leaves "user-definable" space in its instruction set so they can re-optimize the instruction set on the fly. Whatever you're running, its most commonly used functions can almost slide from being code to being "on the chip" and that's sure to speed up the experienced speed.
Yeah, I know its a
errmm... (Score:2, Funny)
being code to being "on the chip" and that's sure to speed up the experienced speed.
first, where exactly is code run, if it isn't 'on a chip', and second, what? speed up the experienced speed?
you mean, as opposed to something like 'pretended speed', which is what i imagine you were using to measure your rapid desire to let your undoubtedly 'speedy' fingers get through your slashdot post without thinking
'experienced speed' indeed...
Well... (Score:4, Informative)
He could have posted clearer, if he wasn't trying for first post.
Re:errmm... (Score:2)
When a function is defined in code, you have to use multiple processor cycles to complete the function. However, when the funciton is "on the chip", that entire function can be completed in just one assembly-level call to the processor.
"Experienced speed" is of course a pseudo-benchmark because it can't be standardized, and its components highly specialized. It's how fast you can complete a set of
Re:errmm... (Score:2, Informative)
But you cannot say that one "assembly level call" to the processor will take (even) fewer "processor cycles" to complete. Hint: very few instructions in even today's CPUs take a single clock cycle to execute, most take several, it's just with pipelining, many instructions h
Re:New application-speed records to be set... (Score:4, Interesting)
Beware! (Score:5, Funny)
Sure it will.... (Score:5, Funny)
so does that mean... (Score:3, Insightful)
Re:so does that mean... (Score:5, Informative)
That's already here. It's called "C".
Re:so does that mean... (Score:2)
Where I see a real possibility is in taking the JVM/CLR/Parrot/etc. and putting part of THAT functionality on-chip. Imagine your bytecode or interpreted programs running as fast on this platform as a compiled program runs on your run-of-the-mill Intel or AMD processor!
Re:so does that mean... (Score:2)
Given the choice between writing all of my programs in assembly, or being thrown face-first down a flight of stairs, I'd have to think about it.
Re:so does that mean... (Score:3, Informative)
Whoa.. (Score:5, Funny)
Let's not do this one.
One word . . . (Score:3, Funny)
yawn ... (Score:4, Insightful)
[okay, okay, so it'll be -hell- fun to design codecs and other protocols that can switch their chipset dynamically, yeah, but i'd need 1000's of them deployed to have a real reason to do it...]
So, do they have Chippy? (Score:5, Funny)
Comment removed (Score:3, Interesting)
Re:Can someone explain? (Score:5, Informative)
This chip sounds like a hybrid between an FPGA and a run of the mill general purpose RISC processor. Being based on a RISC instruction set, you code for it as you would a normal processor, however if the compiler sees code which could take advantage of having more CPU support, it could add instructions to the FPGA like portion of the chip to enable better throughput.
The short summery is: FPGA, programmed from scratch. Standard RISC processor: Already has instruction set which you program against.
This could be quite handy for some of the embedded programming I do.
Re:Can someone explain? (Score:2)
Re:Can someone explain? (Score:2)
Re:Can someone explain? (Score:2)
If I read the article correctly, the difference is in the compiler.
When you write code for this processor, the compiler would figure out which operations would fit best in reprogrammable logic, then configures the logic and compiles to this custom instruction set all on its own. At runtime, the custom logic is loaded and the program executes.
A traditional FPGA, while reconfigurable, is normally developed in Verilog or VHDL. Where reconfigurable logic is used in a micropr
Re:Can someone explain? (Score:2)
All in all it seems like these have a developer environment which helps the user port C/C++ programs to this platform. There has been quite a few of those chips / systems before though. It will be interesting to see if this one can take off the ground where the others have failed.
FPGAs and the rest of the acronym zoo. (Score:5, Informative)
Short answer: FPGAs let you build using basic gates and (very small) lookup tables. This lets you build anything you please, and fully optimize the number of functional units of each type that you have, but has a speed and size penalty.
This chip is basically a RISC processor with an FPGA-type fabric bolted on as a co-processor, as far as I can tell from the detail-poor press release. By implementing most of the instruction pipeline as fixed, optimized hardware, it runs without any of the penalties of a purely FPGA-based implementation. When you have a number-crunching task that would benefit from a custom logic implementation enough to offset the performance penalty of implementing it in programmable logic blocks, the compiler configures the programmable logic into a suitable coprocessor which is stuck in as an extra branch of the instruction pipeline.
How much benefit you get from this depends on what you're doing. Modern general-purpose microprocessors have enough vector instructions to handle most DSP-ish tasks without an abysmal speed penalty (just a large size and power penalty over a purely DSP-based implementation). Most computing tasks aren't limited by processing horsepower at all - they're either waiting for memory accesses to complete (even cache accesses are very slow compared to register accesses), or they're waiting for the target address of a branch to be decided (speculation and BTBs don't address this perfectly by a long shot). A reconfigurable processor would suffer from much the same type of problem. While using the programmable logic path for slice processing could remove some of the branching penalties (by following all paths and selecting the desired result), this would be at an even greater area and power cost.
For specialized applications, it would be quite useful, of course.
A quick glossary of terms being thrown around, for anyone confused:
This is a combination of lookup tables, sum-of-products combinational logic blocks, and scratch-pad SRAM that you can hook up in nearly arbitrary ways to produce custom circuits at a gate level. Bulky and slow, but good at implementing algorithms efficiently. Configuration information is loaded from a serial PROM chip at startup, letting you change it relatively easily.
Like an FPGA, but stores configuration information internally, so you need to take out the CPLD and burn it to change configuration instead of re-burning the configuration PROM.
Little cousin to CPLD. This is what you played with in second or third year. Typically these are just a sum-of-products combinational logic block with a register stuck on the end to latch the output. Useful as glue logic.
This is an integrated circuit that's half-made. A number of gates and registers and so forth have been fabricated on the chip, and the lowest few metal layers have been used for internal routing for these, but you get to define the upper metal layers to form arbitrary connections among these (either as the last fabrication step, or by laser-cutting a pre-fabricated wiring mesh to leave the geometry you want). Works much like a CPLD, but the design is decided at fabrication time and cannot be changed. Faster and less bulky than a CPLD implementation.
This is a custom-fabricated integrated circuit that uses cells from a standard library of components, usually automatically placed and routed from a VHDL or Verilog description of what you want the chip to do. Faster than an ASIC if you have good place and route software, but more expensive in small quantities because you're making what amounts to a full custom chip. Design time is much less than a fully custom design would be, though (but verifying that the design description is correct is a royal pain).
I hope this clears things up for anyone who was confused.
Re:Can someone explain? (Score:3, Informative)
Re:Can someone explain? (Score:2, Informative)
dunno where u got that definition...
more info (Score:5, Informative)
Of course, there is no such thing as a universal solution and the Stretch processor does have its limits. One significant area is in "low touch" operations such as network processors. While it can certainly do the relatively simple packet inspection and transformation that switch fabrics and network processors normally handle, it is really much better suited to the heavy-duty calculation- and manipulation-intensive tasks found in "high touch" applications such as video compression. For example, H.263/264 motion estimation is capable of producing very high-quality video from a relatively small bit stream, but requires lots (and lots) of raw processing horsepower. Happily, the Stretch processor is only too happy to oblige, churning out a SAD (sum-absolute difference) operation on a tile-full of pixels for H.263 video in 43 ns (H.264 takes 83 ns).
even more info (Score:2, Informative)
EE Times has an article here [eetimes.com]. Apparently this chip has a competitor. There's also more details about the chip itself.
(Anonymous because logging in at work)
This is a setback for crypto-land... (Score:5, Insightful)
In short, the time-to-crack using consumer technologies for almost any form of crypto is about to take a step backwards. It won't "break" anything, but the brute force combinations will be able to be examined in a faster time, meaning higher standards will be needed for the same level of protection you have today.
Not surprising, these breakthroughs will always keep coming...
Re:This is a setback for crypto-land... (Score:2, Insightful)
Re:This is a setback for crypto-land... (Score:2)
Taking more time to encrypt/decrypt isn't a problem (does anyone here notice the differance between 2.5ms and 5ms?) but reducing the crack time by the same proportions means that codes that were built to last years might only last months, or even mere weeks, which is a real problem.
Re:This is a setback for crypto-land... (Score:4, Insightful)
If you insist on putting words in their mouth, then yeah, you might consider it a set back. But that's your misunderstanding, not theirs. All reputable encryptors have accounted for Moore's Law in their cost/benefits tradeoffs. Since it doesn't take much encryption power before it requires computers larger then the Universe to crack it via brute force (and since "cracks" on good encryption are really typically just ways of collapsing the search space, not procedures that give immediate answers, often adding more bits will require Universe sized machines, too), this isn't that big a deal for encryption. Push your key size up and be done with it. Even conventional machines can handle that today, it just takes longer.
Re:This is a setback for crypto-land... (Score:3, Interesting)
As computers speed up, both encryption and decryption get faster. However, while adding another 128 bits to 128-bit symmetric cipher may be "free" with newer computers (and ev
Re:hahahahaha ... Worst Math Ever (Score:3, Informative)
Most cryptology systems are purposefully designed to take an absolutely absurd amount of time to crack -- exactly to account for many of these instant 1000 fold improvements.
Anything more? (Score:5, Funny)
How is it possible? (Score:5, Insightful)
Re:How is it possible? (Score:3, Informative)
Re:How is it possible? (Score:3, Informative)
You are making the assumption that all of this is done on the fly. It's not. The compiler would, at compile time, locate can
Re:How is it possible? (Score:3, Informative)
If you have something that needs to do a simple operation on each member of a large data set, the chip could be configured as many tiny simple cores that are just smart enough to do that operation.
Or if you needed to do a complicated math function, you could optimize the cpu for that
Re:How is it possible? (Score:3, Informative)
The basic idea is to move problems from the time space (i.e. do X then Y then Z taking T time to do it) to the physical space (i.e. do X next to Y next to Z taking S transistors to do so, but only one cycle). So your simple add operation in a regular microprocessor, which fetches t
Re:How is it possible? (Score:2)
You can do lots of addition/subtraction instructions to get the result of a single multiplication instruction.
Maybe they meant to say thousands of clock cycles can be reduced to one clock cycle since you can have larger single instructions(i.e. squareroot over pi or something) programmed into the chip that only take one cycle?
Re:How is it possible? (Score:3, Informative)
Re:How is it possible? (Score:2)
Sure, you have to "call your shot" and define your new function before you can use it, but storing the function inside the chip rather than as code makes it a whole lot faster to use...
Finally (Score:2, Funny)
Reduced Benefits for Virtual Machines? (Score:4, Insightful)
Re:Reduced Benefits for Virtual Machines? (Score:2)
So "virtual machines" is a situation this chip hasn't had to encounter yet. I'm guessing that a PC user would have to throw the switch manually to change which "processor image" is running at any given time...
Not really new technology (Score:5, Informative)
Re:Not really new technology (Score:2, Insightful)
That reminds me of... (Score:5, Interesting)
Re:That reminds me of... (Score:5, Interesting)
The interesting result was that the circuit designed by the GA didn't use conventional structures, but instead, according to traditional circuit design theory, should not have functioned at all -- dead loops, etc. The behavior and result was tied to the physical FPGA being used to test and give feedback to the GA -- the minute nuances, as you referred to them -- and was not portable to even another instance of the exact same FPGA.
Re:That reminds me of... (Score:5, Interesting)
Crazy stuff.
Re:That reminds me of... (Score:4, Informative)
He used a Xilinx FPGA and a genetic algorithm (implemented separately) to evolve a circuit which could distinguish (IIRC) two different frequency tones on the input as a logic level output. The "program" was allowed to interconnect the FPGA configurable logic blocks in any old sort of way internally and between CLBs. This would include ways which would cause logic designers to shudder in horror :), and did not include a clock input to the circuit at all.
The result was a successful circuit that used a relatively small portion of the FPGA. But trying to work out how it was accomplished the tone discrimination was impossible. There were sub-circuits that were isolated from the rest of the circuit but when removed would cause the circuit to fail. Thompson hypothesized that the circuits were taking advantage of "out of band" communication via electromagnetic or thermal influences on adjacent CLBs.
Furthermore, the circuits turned out to be very specific to the ambient temperature during training and usage, as well as being specific to a particular FPGA used (a working circuit on one would fail on another.)
In any case it was a fascinating small-scale exploration of what reconfigurable hardware and genetic algorithms could accomplish, when not constrained by the "clock driven sequential logic" paradigm nearly all human engineered circuits use.
damn!! (Score:2, Funny)
Sounds good on paper, but... (Score:5, Insightful)
Yes sure, rewirable chips would be cool for certain applications, but how does one go about making it deal with multiple applications with multiple needs? You'd over load the CPU with a truckload of specialized instructions - which would probably slow it down. Granted, I see uses in things like mobile phones, but for multitasking machines, a 'Jack of all trades' chip is the way to go.
Re:Sounds good on paper, but... (Score:2)
As someone who designed such products, I think the chip has a very good shot at succeeding if it does what it says. In fact it is EXACTLY what I need for several projects.
Assuming it performs comparable to a TI DSP and costs only slightly more, I can make a cheeper product because I have fewer chips on board (just the
Re:Sounds good on paper, but... (Score:3, Informative)
not quite accurate summary (Score:3, Interesting)
This would compare with FPGA's I believe in that most FPGA applications are fixed once loaded, although I know that there was talk about stargate systems on slashdot (http://slashdot.org/article.pl?sid=03/02/15/1629
using FPGA's for general processing before.
Re:not quite accurate summary (Score:2)
Search around for reconfigureable FPGA and you'll find that there is several projects which does this. I know of three such projects of the top of my head (Stargate, RAW, Mitrion) so I would exactly call the idea new.
possibly useful (Score:2)
PLD's have been around for years. (Score:2, Informative)
FPGA (Score:2, Interesting)
Hardware manufacturers that need special hardware operations (IE MPEG-2 decoding) use dedicated, custom hardware for large volume production. Dynamically configurable hardware is expensive for large scales production, and small scale production will likely use FPGA for similar effect. I may be sceptical, but I doubt it'll catch on.
FPGAs with embedded PowerPC processors (Score:2)
I do wonder how they deal with heat dissipation. :-)
Not too different from what's already available... (Score:5, Informative)
Stretch is different in a few ways:
It pulls the FPGA closer to the core, so that it can be utilized almost as part of the pipeline. I say almost because of the following statement in the article:
Inside the chip, the ISEF is coupled to the rest of the circuit by 128-bit buses and has 32 128-bit registers. It runs in parallel with other areas of the processor, effectively becoming a fully reconfigurable co-processor, and can be reprogrammed for new instructions at any time during operation.
So it's still fairly seperate from the processor core.
But the core itself is high performance (fast clock, a little faster than the average FPGA) and it has a very fast memory bus (again faster than the average FPGA)
The downsides are likely to be:
1) Power cost and dissipation. Since it's a slow clock, the dissipation probably won't be bad, but it's not going into a small portable machine.
2) Time to reconfigure. This isn't meant to be a general processor with task switching. Context and task switching is going to be expensive and if you plan on running two concurrent tasks which both require special instructions the entire processor will likely perform, on average, much worse than it would without the reconfigurable portion. Unless, of course, the processes were created to use the same set of special instructions so the context switch isn't more expesnsive than it is for today's processors.
So they are targetting it correctly, it seems. Specialized areas with, in general, only one task/program running at a time. Multimedia players, for example, would be great here. A digital recorder/player would work well if both the encoding and decoding portions of the code were compiled so the special instructions created wouldn't have to be changed for either application to allow playback while recording.
-Adam
How will this affect cross-platform development? (Score:4, Interesting)
This sounds vaguely like the dream solution for developers. The article says:
Does that mean it can handly booting multiple OSes simutaniously? If so, how long before someone writes an app that bridges multiple OSes, allowing the equivalent of emulation, without the emulation? I don't know about the rest of you, but the potential of this chip sounds like a dream come true. And at $35-$100 per chip... it's cheaper than the processor for most systems anyway.
The first processor that can? (Score:5, Informative)
Gaming? (Score:4, Interesting)
Imagine the optimizations that you could do for the next release of the Doom engine. They could own the market for GPUs that optimizes itself for specific games. Could be amazing.
Better article on EETimes (Score:5, Informative)
Woooo (Score:3, Interesting)
Re:Woooo (Score:3, Informative)
One piece missing for genetic processing... (Score:3, Interesting)
oh yeah, we have those... PEOPLE! Now, can I get those neural processor connects and graft this thing to my head already?
Sounds like a FPGA to me (Score:2)
The ability to dynamically reprogram on the fly in-circuit sounds cool though.
Everything old is new again (Score:2)
Project STRETCH
http://en.wikipedia.org/wiki/IBM_7030
Any More Information? (Score:2)
From the article, I presume that the processor's microinstruction memory can be updated with special information embedded in the executable file. This is not as unique as you might think: virtually all Intel and AMD processors have the ability to have their microinstruction memory updated during the boot process - this is used up upload microinstruction updates/corrections wit
new concept, but not new hardware (Score:4, Insightful)
This will be useful in places that they mentioned. Places where you do a lot of processing that takes many generic instructions but can be translated into a single string of descrete instuctions.
The more I think about it, this is the direction processors are going. We keep moving processors towards RISC based cores. We keep adding specialized paths for things such as multimedia. Eventually we WILL have half the processor being a purely RISC core and half being programmable hardware for specialized computational intensive instructions. I retract my initial view.
I do wonder though, what the life is on the hardware side. How many times can you reprogram the hardware before it starts to die. What is the error rate in reprogramming it? What happens when a few programmable transistors die?
This != New (Score:2, Informative)
Perhaps the most notable (in its conception, at least) was Seymour Cray's attempt at a Pentium Pro core + reprogrammable extensions (via FPGA or the like) at his post-Cray Research company. More recently, IBM licensed PowerPC cores for use by Xilinx. Up to four of those cores get thrown on the die with a Virtex-II FPGA (?); each of the cores has the ability to add opcodes in FPGA lan
Real World Performance (Score:2, Insightful)
Natural questions come to mind like how quickly does the chip configure itself to optimize for the application, does the configuration only occur at start of the application, how many chip-configuring applications can it run concurrently, will it optimize for interpreted languages, can some configurations be made "permanent" to accom
Star Bridge Systems already does this. (Score:2)
Star Bridge Systems [starbridgesystems.com] has been selling computers that reconfigure their own logic (with the help of compilers) for about 5 years now. True, their solution isn't a single chip, but the idea of reconfigurable computing is not at all new, and Star Brigdes implementation appears to be even more flexible.
Compelling market proposition? (Score:3, Insightful)
Perfect for emulation (Score:3, Interesting)
Field Programmable Gate Arrays (FPGAs) (Score:3, Interesting)
Been there, done that (Score:3, Interesting)
Of course, that was only a little over 20 years ago.
FYI: Since somebody is going to ask... The original Z80000 design was killed when Zilog stalled out as a general purpose processor maker and moved into embedded processors after the bugs in the initial run of Z8001 chips and IBM's selection of the Intel 8088.
Two companies announced similar products today (Score:3, Informative)
Also, keep in mind, customizable ISAs have been around for a while -- in Tensilica and ARC processors. These guys do it dynamically.
Altera's Nios Processor (Score:3, Interesting)
Altera produce an FPGA with one or more built in ARM processors. This sounds very similar to the Scratch system, but the ARM processors are limited in connection into the fabric of the FPGA by the not particularly fast bus used with the processor. Scratch appear to have made the data transfer rate between the two parts of utmost importance, which is essential in high throughput applications like this.
Altera have also developed a softcore processor, that is one implemented entirely on an FPGA. It is highly configurable - instructions can be added, cache and memory behavior altered, buses adapted, etc. Coupled with things such as the DSP blocks (trees of multiply accumulates), a 50Mhz processor can process data in a specific task at the same rate as a general purpose processor running at 10 times the speed.
The work I'm doing is investigating the use of many of these processors on one fpga. Levels of optimisation that cannot be done with conventional multiprocessor systems will be possible. Changing the memory system to deal with specific algoriths, or bus widths between certain processors will allow much better performance.
Scratch also seems to be making a difference by claiming to have easy to use and working development tools, which is one thing that Altera cannot really claim to have done.
Re:Cue Skynet jokes (Score:5, Funny)
Sooooo this T800 model Terminator walks into a bar with a poodle under on arm and a basketball under the other...
Help for a n00b. (Score:3)
Re:Hmmmm... (Score:5, Funny)
Ahh - that's easy. You should have routed the ion core voltages through a phase discriminator; would have cleared that right up.
I think they must have shunted the positrons through the floating point pathways
No, that would have caused a cascade failure in the deflector array.
Re:Ummm... (Score:2, Insightful)
See the script [sfy.iv.ru]
Re:someone remind me... (Score:2)
Don't ask.
RISC
Reduced Instruction Set Computer
DSP
Digital Signal Processing
ASIC
Application-specific integrated circuit