Clockless Computing 342
ender81b writes "Scientific American is carrying a nice article on asynchronous chips. In general, the article advocates that eventually all computer systems will have to move to an asynchronous design. The article focuses on Sun's efforts but gives a nice overview of the general concept of asynchronous chip design." We had another story about this last year.
Think of a water clock (Score:2, Informative)
Re:Explanation, sorta (Score:5, Informative)
For me, this is kindof amusing: asynchronous logic is where you start out - it's more basic (shove signal in, get signal out). You then move to synchronous logic to eliminate glitches and possible race conditions (clocked flipflops, etc.). Apparently now you move BACK to asynchronous logic to gain performance. I can't disagree : working with synchronous systems, I've always been annoyed that certain combinations couldn't be used because they were too close to a clock edge, and it could miss the latch. If you can eliminate the glitches and race conditions, asynchronous logic would be faster. Of course, that's like saying "if software bugs didn't occur, you'd never need to look over code." Yes, true, valid, but not gonna happen.
Of course, they're not talking about a true full asynchronous design: just getting rid of external clock slaving. The external clock would still BE there, for the external architecture - it's just that the internal architecture wouldn't all move to the beat of a clock.
For instance, I'm pretty sure that no one is suggesting getting rid of synchronous memory design: it's just easier that way.
Re:Oh, God, NO! (Score:1, Informative)
And if a power outage occurs, then yes, the computer may have a bit of trouble determining the time. It also may have a bit of trouble running at all.
As per the second comment, I don't know where you got that from: you wouldn't measure the frequency (which... would only need a frequency-to-voltage converter anyway: any basic EE text will have that) - you just need to use a bit of electronics to clean up the sine wave, turn it into a square wave, and then clock some logic with it. It's just as easy as using a crystal.
Heard about this stuff in class (Score:2, Informative)
It seems to me that clockless chips like these would seem to work very well with MIPS style processors - where you have lots of little instructions. However, you can't take advantage of the extreme pipelining features that chips like the Pentium 4 use when you don't have a clocked design. It would take a lot of research and a lot of re-education to get the design engineers to start thinking asynchronously instead of clocked, but my professor seems to think that eventually there will be no other way to speed things up.
Its also like you'd be trading in one problem for a host of others. I remember doing tests on 1GHz clock chips, and those things had to be absolutely PERFECT in order work correctly on the motherboard. They ate up a lot of power too. However, an asynchronous design would have its own traps. You can design a state machine for it an then minimize the states, but glitches will do a lot more harm on a chip that is running asynchronously. Plus you have to take into account that chips run at different speeds at different temperatures. I think we have a long way to go in the quality of individual electronic components before we can actually implement a modern processor that is asynchronous.
By the way, that Professor's name is John McDonald, and he's here at Rensselaer Polytechnic Institute.
-Montag
Re:xxxxx Thisxxxx isxxxxxx horrible (Score:5, Informative)
Because rephrasing your question as above is what synchronous looks like; every word has to be padded to the longest word length. Asynchronous is like normal written language; words end when they end, not when some 5 char clock says so. Another crude analogy is sync vs async serial comm, except using hoffman(sp?) encoded chars, so async can use variable length chars, but sync has to padd the short ones out to the length of the longest.
I tried underline instead of x but the stupid lameness filter objected/
Re:Small scale, and then larger (Score:2, Informative)
I also haven't seen or heard of any large-scale software tools for doing this sort of analysis (as opposed to classic synchronous design, where one can pick from at least half a dozen static timing analyzers on the market today). This is probably at least a big a gate as anything else.
Re:Return of the 68000? (Score:3, Informative)
Well, kind of. A bus cycle completed when someone signaled "data transfer acknowledge" (DTACK) - then the CPU would read the data off of the bus. Most systems understood where in the address space the memory request was going, how fast that device was, and had logic to count system clocks to trigger DTACK when the data "should be" ready. (In fact, most memory devices have no way of signaling when a read has completed - they just guarantee in in a certain amount of time.)
On the other hand, if you didn't hit DTACK in time, a bus error was generated and an exception routine triggered. Ahhh, the good old days
Re:1 Million reward (Score:5, Informative)
That's one of the key benefits of clockless computing: an instruction runs through the processor as quickly as the electrons can propagate through the silicon. In other words, the processor is ready to accept the next instruction at the exact instant it's available. You just can't pump it any faster...
HOWEVER,
Electricity propagates through Silicon faster when the temperature drops. Thus, the COOLER an asychnronous chip runs, the FASTER it gets! This opens up alot of exciting doors....and will certainly ignite hordes of development in the CPU cooling industry if async chips ever get off the ground. For an async chip overclocking = overcooling.
Re:Small scale, and then larger (Score:4, Informative)
You've hit the nail right on the head. Async circuits aren't harder to design; they're harder to verify and debug. Historically the tools just haven't been up to it and, despite some recent breakthroughs, I'm not sure they are now. Check out the work at CalTech [caltech.edu], Manchester [man.ac.uk], and Theseus Logic [theseus.com] for the current state of the art.
Re:Small scale, and then larger (Score:2, Informative)
The logic inside of standard clocked logic is asynchronous, but the clock is used to make sure you look at the result only when it is known to be valid. The clock rate is limited by how long it takes to assure the logic state to be stable.
The timing and scaling issues exist even with clocked logic, which is why it took so long to make high clock rate motherboards. The data transfers on a modern motherboard are happening at well above the frequency of the FM radio band (tops out at around 108 Mhz), which makes the physical design of the board very interesting. You need to make sure that signals travel the same distance if they are supposed to be evaluated together, like the address or data buss.
The change to asynchronous logic means you have to change the way you design your logic, change all of your CAD software you use to design the chips, and change all of your automated test equipment you use to certify which chips are good. This is a massive conversion for the chip industry, taking a great deal of time, and a great deal of money.
Clockless primer (Score:3, Informative)
http://www.wired.com/news/topstories/0,1287,6179,
Re:1 Million reward (Score:1, Informative)
The tradeoffs between synchronous and asynchronous design are that clock circuits take up a large amount of real estate on a chip. However, the control logic for asynchronous designs can take up as much space. Clock skew issues are nasty at high speeds when you try to ship it around to various parts of the chip too, where in asynchronous designs, there is no skew to worry about really. Asynchronous designs can use much less power than synchronous designs.
Depending on how the thing is designed, the data that is being processed in an asynchronous design may govern how fast the thing performs. For example, it could be that as long as we are adding many numbers that have less than 5 '1' digits in them we are very fast, but if you have over 10 '1' digits in them, it is a bit slower. Certain bit patterns may require longer times to settle as well.
Cooling actually does speed up asynch CPUs (Score:5, Informative)
The guys in the lab used to demo this by hooking up an oscilloscope to show the instruction rate. They would then get out a can of liquid nitrogen, and pour it on the CPU. The instruction rate would climb right up... This lead to many jokes about temporary cooling during heavy loads. "Hey, get the ice cubes... He's starting gcc!"
I believe our group used a different basic latch design than Sutherland describes. We handled all bits asynchronously using three wires, one that went high for 0, one that went high for 1, and a feedback wire for "got it". His design looks like it could latch a bus of wires simultaneously. Forgive me if I'm wrong... it's been almost a decade.
One of the nice features of these chips is that they are tolerant of manufacturing errors. Often impurities in the silicon will change the resistance or capacitance of a long wire. In asynchronous designs, this just means operations that need that wire will be a little slower. In the synchronous world, either the whole chip fails or you have to underclock it.
A group of ex-Caltech graduate students started a company to sell these asynchronous processors. Details at Fulcrum Microsystems [fulcrummicro.com].
(For those at Caltech: Yes, that's me on the asynch VLSI people page. And yes, I wrote prlint. What an awful piece of software that was.)
Re:Arbiters... Something doesn't make sense. (Score:2, Informative)
This is the meta-stable line that the article refers to. So set up another state (a "close collision state)that detects the meta-stable case and alternate letting the signals pass. But by doing this you create another meta-stable line between the "go left/right" state and new "close collision". Say the time difference falls in the new meta-stable line. How do you decide? This is where it gets tricky because you always have a boundary between states, and the more logic you throw in an arbiter the longer it takes to process the common "only the right/left signal is here so i'll let it pass" state and all other states.
Also, the penalty for landing on the smaller meta-state will be proportional to how much more unlikely you made it to land on.
So instead of best case of 200 ps and a 10% chance of 300ps, your suggestion might be a best case of 220 ps and a 5% chance of 600ps. Remember, when talking pico seconds, nothing is free.