Clockless Computing 342
ender81b writes "Scientific American is carrying a nice article on asynchronous chips. In general, the article advocates that eventually all computer systems will have to move to an asynchronous design. The article focuses on Sun's efforts but gives a nice overview of the general concept of asynchronous chip design." We had another story about this last year.
HOW???? (Score:2, Funny)
1 Million reward (Score:5, Funny)
Re:1 Million reward (Score:5, Funny)
Re:1 Million reward (Score:3, Insightful)
Re:1 Million reward (Score:5, Informative)
That's one of the key benefits of clockless computing: an instruction runs through the processor as quickly as the electrons can propagate through the silicon. In other words, the processor is ready to accept the next instruction at the exact instant it's available. You just can't pump it any faster...
HOWEVER,
Electricity propagates through Silicon faster when the temperature drops. Thus, the COOLER an asychnronous chip runs, the FASTER it gets! This opens up alot of exciting doors....and will certainly ignite hordes of development in the CPU cooling industry if async chips ever get off the ground. For an async chip overclocking = overcooling.
Re:1 Million reward (Score:2)
Does this mean that the speed varies based on temperature?
Initially this idea bugs me a bit because it means that a computer would have 'moods' based on the temperature. I could see that being a little problematic. The nice thing about a clock is that you can reasonable expect things to be done within a certain number of ticks. With variable speed processors, some synchronization issues will definitely arise that'll need solving.
On the other hand, lots of work has already been done that way. Look at Quake 3 played over the internet. Lotsa people connect to a server with variable speed connections and response times, but the game manages to remain playable.
Maybe I'm worried about nothing. For the uninitiated it'll take a bit to wrap their minds around.
I do have a feeling though that ther'll be a market for both types of processors for the forseable future.
Re:1 Million reward (Score:2)
But it can't make it worse than it was. Yes there is a paracitic resistance to increases in performance, but that's like the graduated tax-scale.. You can't lose money by making more, you only get taxed at a greater percentage for each additional dollar (not the preceeding dollars).
Re:1 Million reward (Score:2, Insightful)
Re:1 Million reward (Score:2)
Re:1 Million reward (Score:2)
What, so you can tell me what the electron's position and velocity are?
Actually this is a good illustration of why absolute zero is unattainable. But electrons are a bad example. It seems someone thinks electrons only move because of thermal motion.
Electrons are still moving quite fast at absolute zero. In fact, electron speed is not hugely affected by earthly temperatures. And as things heat up and the crystal gets hotter, thermal scattering starts interfering with electron mobility. They travel a shorter mean free path before bouncing off at some weird angle, so they don't get as far as they do at colder temperatures.
Nuclear motion does die down near 0 Kelvin, but actually reaching absolute zero would require a violation of the uncertainty principle as you suggest.
Re:1 Million reward (Score:5, Funny)
Re:1 Million reward (Score:3, Insightful)
One huge advantage of asynchronous circuits is that you can turn the power down, and the chip simply slows down (up to a point, but you see the point). You turn power up (increase Vcc) and the chip runs faster. Same principles apply in overclocking your desktop chip, except here you don't need to crank voltage AND clock
Of course doing this could ruin your chip.
m
Cooling actually does speed up asynch CPUs (Score:5, Informative)
The guys in the lab used to demo this by hooking up an oscilloscope to show the instruction rate. They would then get out a can of liquid nitrogen, and pour it on the CPU. The instruction rate would climb right up... This lead to many jokes about temporary cooling during heavy loads. "Hey, get the ice cubes... He's starting gcc!"
I believe our group used a different basic latch design than Sutherland describes. We handled all bits asynchronously using three wires, one that went high for 0, one that went high for 1, and a feedback wire for "got it". His design looks like it could latch a bus of wires simultaneously. Forgive me if I'm wrong... it's been almost a decade.
One of the nice features of these chips is that they are tolerant of manufacturing errors. Often impurities in the silicon will change the resistance or capacitance of a long wire. In asynchronous designs, this just means operations that need that wire will be a little slower. In the synchronous world, either the whole chip fails or you have to underclock it.
A group of ex-Caltech graduate students started a company to sell these asynchronous processors. Details at Fulcrum Microsystems [fulcrummicro.com].
(For those at Caltech: Yes, that's me on the asynch VLSI people page. And yes, I wrote prlint. What an awful piece of software that was.)
Re:1 Million reward (Score:2)
Re:1 Million reward (Score:2)
Or you can attach a faster hamster. [man.ac.uk]
Re:1 Million reward (Score:2)
Look at Sun's newest high-end graphics card, the XVR-1000. The heart of this card is the MAJC, which is Sun's new async processor.
Yeah and... (Score:2, Funny)
Re:Yeah and... (Score:2)
One problem with asynchronous logic (Score:2, Interesting)
Re:One problem with asynchronous logic (Score:3, Insightful)
Consider that the Pentium 4 added entire pipeline stages for the sole purpose of getting data from one side of the chip to the other in step with the clock.
Consider that the x25, a largely asynchronous chip, has about as many gates as a 386 yet contains 25 parallel processors.
The main problem isn't impossibility or complexity; the problem is that asynchronous design isn't yet understood. We have a LOT of research to do. Once we've done it, engineers will consider asynchrony to be a simple, solved problem.
-Billy
Comment removed (Score:4, Funny)
Re:clockless computing? (Score:4, Funny)
Why is it that a $5 watch can keep perfect time but a $10,000 computer cannot?
The Amiga Zorro Bus was Asyncronous (Score:4, Interesting)
Re:The Amiga Zorro Bus was Asyncronous (Score:5, Funny)
Re:The Amiga Zorro Bus was Asyncronous (Score:2, Insightful)
Re:The Amiga Zorro Bus was Asyncronous (Score:2)
The hapless Amiga, a machine a decade ahead of its time (there's a lesson in there somewhere)
Re:The Amiga Zorro Bus was Asyncronous (Score:2)
Re:The Amiga Zorro Bus was Asyncronous (Score:2)
Re:The Amiga Zorro Bus was Asyncronous (Score:2)
Re:The Amiga Zorro Bus was Asyncronous (Score:2)
I suppose that's true since RAMs internally are just big combinational circuits, but RAMs before SDRAMs weren't really used in an asynchronous way 'back in the day'. The microcode in the CPU was simply programmed to wait for a certain period of time after putting an address on the memory bus before reading the data lines back into the CPU. In this sense, the memory usage was synchronous and only RAMs with a certain response time could be used with a CPU.
The 68000, OTOH, had actual handshaking control lines so that the RAM (or some controller?) could tell the CPU when the data was valid. In principle, RAMs with widely different valid-data times and devices with widely different speeds could be used on the same bus and the CPU would wait for them. I'm not sure how wide-spread the asynchronous-bus idea ever became.
Explanation, sorta (Score:3, Interesting)
This is really cool. I was learning a little about asynchronous systems in my Logic Design and Computer Organization class last fall...they seemed pretty cool on a small scale, however they could get really difficult to work with when you're dealing with something as complex as a processor.
Re:Explanation, sorta (Score:5, Informative)
For me, this is kindof amusing: asynchronous logic is where you start out - it's more basic (shove signal in, get signal out). You then move to synchronous logic to eliminate glitches and possible race conditions (clocked flipflops, etc.). Apparently now you move BACK to asynchronous logic to gain performance. I can't disagree : working with synchronous systems, I've always been annoyed that certain combinations couldn't be used because they were too close to a clock edge, and it could miss the latch. If you can eliminate the glitches and race conditions, asynchronous logic would be faster. Of course, that's like saying "if software bugs didn't occur, you'd never need to look over code." Yes, true, valid, but not gonna happen.
Of course, they're not talking about a true full asynchronous design: just getting rid of external clock slaving. The external clock would still BE there, for the external architecture - it's just that the internal architecture wouldn't all move to the beat of a clock.
For instance, I'm pretty sure that no one is suggesting getting rid of synchronous memory design: it's just easier that way.
Re:Explanation, sorta (Score:2)
Isn't that exactly what Sun supposedly did in order to get faster RAM without using RAMBUS? Here's a link. [theregister.co.uk] Maybe it was just in the memory interface? I don't really follow it.
Huh? (Score:2)
Re:Explanation, sorta [--OT?] (Score:2, Interesting)
But I think an asynchronous computer would still use a RTC to keep track of calendar time. It has to keep time even when it's turned off.
Re:Explanation, sorta [--OT?] (Score:2)
Think of a water clock (Score:2, Informative)
Return of the 68000? (Score:2, Interesting)
Re:Return of the 68000? (Score:2)
It was the original CPU chosen for the Amiga 1000 and several of the Atari machines. It was the first consumer 32 bit device (16/24 bit address bus, 32 bit for everything else).
It had a clock speed of 7.14 MHz, or 0.000714 GHz.
Re:Return of the 68000? (Score:2)
Re:Return of the 68000? (Score:2)
Took longer than I thought it would actually.
I forgot about Palm. I have a IIIc so I should know better.
Re:Return of the 68000? (Score:2)
6800 (Aug '74) [sympatico.ca] was an 8-bit chip.
68000 (Sep 79) [sympatico.ca] was a 16/24/32 bit chip (16 data 24 address 32 everything else. It was the first consumer available chip capable of 32 bit math.
Re:Return of the 68000? (Score:3, Interesting)
Re:Return of the 68000? (Score:3, Informative)
Well, kind of. A bus cycle completed when someone signaled "data transfer acknowledge" (DTACK) - then the CPU would read the data off of the bus. Most systems understood where in the address space the memory request was going, how fast that device was, and had logic to count system clocks to trigger DTACK when the data "should be" ready. (In fact, most memory devices have no way of signaling when a read has completed - they just guarantee in in a certain amount of time.)
On the other hand, if you didn't hit DTACK in time, a bus error was generated and an exception routine triggered. Ahhh, the good old days
Re:Return of the 68000? (Score:4, Interesting)
Re:Return of the 68000? (Score:2)
Re:Return of the 68000? (Score:2)
Re:Return of the 68000? (Score:2)
Re:Return of the 68000? (Score:4, Funny)
No, it was so slow it just seemed that way.
combine clocked/-less sections on same chip? (Score:3, Insightful)
Thisisahorribleidea... (Score:5, Funny)
(kidding)
Re:xxxxx Thisxxxx isxxxxxx horrible (Score:5, Informative)
Because rephrasing your question as above is what synchronous looks like; every word has to be padded to the longest word length. Asynchronous is like normal written language; words end when they end, not when some 5 char clock says so. Another crude analogy is sync vs async serial comm, except using hoffman(sp?) encoded chars, so async can use variable length chars, but sync has to padd the short ones out to the length of the longest.
I tried underline instead of x but the stupid lameness filter objected/
Re:Thisisahorribleidea... (Score:2)
By the spaces inserted randomly by Slashcode.
(Sorry, but it's true!)
Small scale, and then larger (Score:3, Interesting)
I foresee lots of bugs, but if they can pull this off, more power to them.
Re:Small scale, and then larger (Score:2, Informative)
I also haven't seen or heard of any large-scale software tools for doing this sort of analysis (as opposed to classic synchronous design, where one can pick from at least half a dozen static timing analyzers on the market today). This is probably at least a big a gate as anything else.
Re:Small scale, and then larger (Score:4, Informative)
You've hit the nail right on the head. Async circuits aren't harder to design; they're harder to verify and debug. Historically the tools just haven't been up to it and, despite some recent breakthroughs, I'm not sure they are now. Check out the work at CalTech [caltech.edu], Manchester [man.ac.uk], and Theseus Logic [theseus.com] for the current state of the art.
Re:Small scale, and then larger (Score:2)
But how complex was the 8600? I can't find a specific number, but some sources say the later 9000 was still shy of a million gates. Researchers have already built async versions of MIPS and ARM cores more complex than that, and still seem leery of applying the same tools to anything on the order of a modern (superscalar, speculating, branch-predicting) CPU or 3D graphics chip.
Also, why is it that people who write about the history of asynchronous logic mention lots of other projects but not the 8600? Was it truly async, as the term would be understood today, or were there still synchronous aspects of the design? Perhaps you could elucidate by describing your equivalents of rendezvous or micro-pipeline structures.
Re:Small scale, and then larger (Score:2, Informative)
The logic inside of standard clocked logic is asynchronous, but the clock is used to make sure you look at the result only when it is known to be valid. The clock rate is limited by how long it takes to assure the logic state to be stable.
The timing and scaling issues exist even with clocked logic, which is why it took so long to make high clock rate motherboards. The data transfers on a modern motherboard are happening at well above the frequency of the FM radio band (tops out at around 108 Mhz), which makes the physical design of the board very interesting. You need to make sure that signals travel the same distance if they are supposed to be evaluated together, like the address or data buss.
The change to asynchronous logic means you have to change the way you design your logic, change all of your CAD software you use to design the chips, and change all of your automated test equipment you use to certify which chips are good. This is a massive conversion for the chip industry, taking a great deal of time, and a great deal of money.
But... (Score:3, Insightful)
Bring on the solid state storage.
Tools (Score:3, Insightful)
Why? (Score:2)
Right now a network is a bunch of arbitrary speed systems just passing messages, can't we scale that down to the computer level?
It wouldn't even involve anything overly revolutionary.
No mhz!?!? The world will end. (Score:2)
I guess I'll just go back lifting weights, over compensating cars and the ruler. *sigh* I hate analog.
Heard about this stuff in class (Score:2, Informative)
It seems to me that clockless chips like these would seem to work very well with MIPS style processors - where you have lots of little instructions. However, you can't take advantage of the extreme pipelining features that chips like the Pentium 4 use when you don't have a clocked design. It would take a lot of research and a lot of re-education to get the design engineers to start thinking asynchronously instead of clocked, but my professor seems to think that eventually there will be no other way to speed things up.
Its also like you'd be trading in one problem for a host of others. I remember doing tests on 1GHz clock chips, and those things had to be absolutely PERFECT in order work correctly on the motherboard. They ate up a lot of power too. However, an asynchronous design would have its own traps. You can design a state machine for it an then minimize the states, but glitches will do a lot more harm on a chip that is running asynchronously. Plus you have to take into account that chips run at different speeds at different temperatures. I think we have a long way to go in the quality of individual electronic components before we can actually implement a modern processor that is asynchronous.
By the way, that Professor's name is John McDonald, and he's here at Rensselaer Polytechnic Institute.
-Montag
Re:Heard about this stuff in class (Score:5, Insightful)
First, it allows different instructions to complete in different amounts of time. An asynchronous chip wouldn't have that disadvantage.
Second, it allows 'idle' portions of the chip to be used by other instructions whose time hasn't come. Asynchronous chips are vulnerable to that as well, but they can be much less vulnerable than even the most pipelined architecture, because dataflow can completely guide the chip: you can hammer in more data as soon as the previous data's been slurped in.
So far from not taking advantage of pipelining, asynchonous chips naturally have one of the advantages of pipelining, and can be built to have the only other.
-Billy
Re:Heard about this stuff in class (Score:2, Funny)
You should consider researching software-defined asynchronous radio logic.
Armada (Score:2, Funny)
Re:Armada (Score:2)
Intel, AMD, etc and marketing (Score:5, Insightful)
if we have clockless computers for the desktop, HOW will Intel and AMD market them?
After all, a large quick and dirty rating they have used for decades is the clock speed. Throw that away and what do you have?
I can see the panic in their faces now...
Re:Intel, AMD, etc and marketing (Score:2, Interesting)
This of course has problems because a lot factors into the speed of a computer. For instance motherboard chipsets will become increasingly important.
Re:Intel, AMD, etc and marketing (Score:2)
Would like to see some real-world results (Score:2)
What's this look like to the programmer? (Score:2)
Are there any "features" related to the asynchronocity of the chip that it would be possible to add to the assembly language of an asyn chip? Becuase individual sectors of the chip can function independently and not have to synchronize, can you kind of get a multiprocessing-within-a-single-chip effect? I.E. can you create a singular asyn chip split up into separate sectors, each of which functions as if it were an autonomous processor? Can you have one chip concurrently execute single threads?
If the answer to this last question is "yes", do you have to do this by organizing the chip such that the different sectors are basically seperate chips on the same cast, or can you just have it so that the exact borders of the the chip area working on a certain thread at a certain moment is reconfigured dynamically? Would it be possible someday to create a microchip whose internal execution model is somewhat like that of Cilk [mit.edu]?
How does asynchronous design fit in with atomic-execution technologies like VLIW and EPIC?
Low-Power Async Procs (Score:2, Interesting)
I think this was from Seiko-Epson. I might have the states screwed up but that's the idea.
How do you know "how fast" a clockless system is? (Score:2, Insightful)
The Pentium IV is supposed to be partially clockless, but to the outside world, all the I/O is clocked, making it easy to benchmark. If the I/O, logic, memory, etc., were ALL clockless, how fast is the machine?
Government contracts of big systems are really picky about things like this.
I think marketing will be the most likely problem for this technology. (Interfacing to clocked equipment won't be.)
"Bucket brigade" analogy unconvincing... (Score:3, Insightful)
Is it just me, or does that picture seem to imply that you get a lower "buckets per unit time" throughput from asynchronous processing?
I know that this is not the claim of the article... but it's still my gut reaction to the graphic.
"Gandy Dancers" (railroad manual track laying and repair teams) were so-called because the first part of their name was the Chicago tool maker that made track laying tools, and the second part of their name came from the fact that they worked to a rhythm.
A better analogy would be a work-content based multipath route, where the amount of time is based on the type of work to be performed.
This would have implied (correctly) that, in an synchronous system, you should be able to "make up for" slow elements by doubling them up: i.e., when you are faced with a slow section of pipe, rather than bottle-necking, make it wider, instead.
Or to use their analogy, if you have a slow guy, then get another slow guy to stand next to him so he doesn't bottlneck the brigade.
Probably a more apt analogy would be nice: it's hard to show throughput increases, except by number of buckets in the hands of the people.
-- Terry
Re:"Bucket brigade" analogy unconvincing... (Score:4, Insightful)
bogus benefit claims (Score:2)
As it stands now, it is more difficult to keep things happening in the right order in an asynch circuit than to route a clock.
The idea that clock has to operate only as fast as the slowest component--Well this is true, but it doesn't matter. The last design I did had numerous clocks in it. The fastest being in the tens of MHz (I know, not that fast), the slowest being less than 1KHz. The portions of my design that were able to run at high speed did. The portions that needed a slower clock got one.
And has anyone heard of a multi-cycle path? Just because a circuit can't complete its objective in one clock cycle doesn't mean you have to slow down the whole boat. If it needs more than one, give it two.
There are a lot of other aspects to be concerned about too... design validation (on paper, before building it), static timing analysis, fault coverage...
Clockless primer (Score:3, Informative)
http://www.wired.com/news/topstories/0,1287,6179,
Those who forget the past (Score:2, Interesting)
The famous PDP-6 was asynch logic. It made a very fast machine out of very few transistors, but was a nightmare to maintain. The follow-on PDP-10 was syncronous logic.
There must be some history out there somewhere of the problems DEC had with the asynchronous logic. Any old MIT research notes?
Clockless issues (Score:2, Interesting)
Manufacturing Hype^H^H^H^HConsent (Score:3, Insightful)
That would be fine if they acknowledged this in the text, but more often than not they take an extremely bullish approach and echo the wildest promises by the researchers as if they were to happen tomorrow.
Very smart people have been working for many years in asynchronous circuits, yet the likeliest scenario are hybrid designs mixing synch and asynch circuits (the asynch circuit stops the clock from propagating).
Why do SciAm and other such publications do this? According to Chomsky because they are told so by the trilateral comission. Personally, I think they do it because it sells magazines.
The problem with overcooling async logic (Score:2, Interesting)
Here's why:
There are two main aspects to consider in an asynchronous chip, gate delay (the time for a gate to open/close) and propagation delay (the time it takes for a signal to go from one gate to the next).
Asynchronous logic works by carefully arranging the length and geometry of the wiretraces between gates, so that the signals coming from those traces all hit their target gate (nearly) simultaneously.
The problem is that gate delays are affected by temperature differently than propagation delays. They both get faster with cooling, and slower with heating, but they do so nonlinearly, and at *different rates*. And asynchronous logic requires those rates to be carefully matched. Change the rates too much, and the chip breaks.
Synchronous logic doesn't have this problem (as much), because the whole point of latching everything between clock cycles is to give the slower signals time to catch up to the faster ones, and to force them all to wait up until everybody is ready (at which point the clock releases the latch, and the next cycle starts). But this has the downside of the extra wiring, circuitry, and power required to run all the clock lines and latches.
Real-Time (Score:3, Interesting)
Arbiters... Something doesn't make sense. (Score:2)
I don't understand this at all - is it just Scientific American oversimplification? Why can't an Arbiter simply decide that if two pieces of data need to pass through the same component, it will let the left one through first this time, and next time there is a conflict it will let the right one through first (in order to avoid systematic "discrimination" against one part of the chip). This decision making process will always take the same amount of time.
Can anybody explain to me what I have misunderstood - I'm sure there must be something I'm not getting, otherwise Sun wouldn't be researching this one piece so deeply.
Re:Arbiters... Something doesn't make sense. (Score:2, Informative)
This is the meta-stable line that the article refers to. So set up another state (a "close collision state)that detects the meta-stable case and alternate letting the signals pass. But by doing this you create another meta-stable line between the "go left/right" state and new "close collision". Say the time difference falls in the new meta-stable line. How do you decide? This is where it gets tricky because you always have a boundary between states, and the more logic you throw in an arbiter the longer it takes to process the common "only the right/left signal is here so i'll let it pass" state and all other states.
Also, the penalty for landing on the smaller meta-state will be proportional to how much more unlikely you made it to land on.
So instead of best case of 200 ps and a 10% chance of 300ps, your suggestion might be a best case of 220 ps and a 5% chance of 600ps. Remember, when talking pico seconds, nothing is free.
No mention of Theseus Logic? (Score:4, Interesting)
Unless I missed it, there was no mention of Theseus Logic's [theseus.com] Null Convention Logic [theseus.com] at all which is a real disappointment. Theseus has one of the few approaches that doesn't require a PhD-level of education to understand and design in.
Re:No mention of Theseus Logic? (Score:2)
Re:No mention of Theseus Logic? (Score:2)
Huh? I'm not talking the packetized boolean stuff of Amulet that Furber came up with. I'm talking Karl Fant's NCL approach which he developed while at Honeywell in the late 60s through the 80s and took commercial when he created Theseus in the 90s.
Re:No mention of Theseus Logic? (Score:2)
Their only comtribution is trying to use threshold "gates" (with hysteresis) which might be very nice isn't usefull for anyrthing except for adders.
Their implementations use DIMS gates (again from 1958) and when they dont they have a non DI (Delay Insensitive) implementation with orphan hazards.
I am doing research on simmilar systems and have found their buziness stratergy to make patents well after they were invented and then make every one beleive that they invented it. They do have some very nice mind melting presentations (Im guessing you went to one).
traffic waves and slotted aloha (Score:2)
Have you wondered why when traffic gets heavy on a freeway, it slows down? This is sort of like an asynchronous processor where every instruction is trying to get processed as quickly as they can (every driver is independent) but they need micro-synchronization to prevent collisions (brake lights, gas pedals). When the freeway is mostly empty, micro-synchronization works fine. However, as you approach the capacity limit, sometimes a global clock helps, sometimes it doesn't...
If you have a pipeline, you get back pressure waves as you approach the capacity which can make things slower than a synchronous system. If the processing topology is more complicated, it becomes even more difficult to analyze...
This effect is well known and affects things like processors and networks. Lookup articles on slotted ALOHA (a packet radio protocol) for some of the math if you are interested in some of the math behind this...
Killer technology (Score:2)
"Cockless Computing"?
There goes my sales to lonely Nunns.
Pipelining (Score:2)
Re:Pipelining (Score:2)
Easy approach to asynchronous computing (Score:2)
Async Links (Score:2)
If you are intrested in async then here is a list of cool websites:
Async home [man.ac.uk] is the main website with resources events and background.
Amulet group [man.ac.uk] have a selection of resources and news.
And if you want a laugh then check out rat powered cpus [man.ac.uk]
No conforming C compilers though! (Score:2)
Re:No conforming C compilers though! (Score:2)
Re:Intel will be pissed (Score:5, Funny)
Re:Oh yeah, how do you upgrade one of these system (Score:2)
They should.
One mechanism I came up with in my first year hardware course was to use "READY" lines for each component, which would toggle state when the device was actually ready for input. It was the responsibility of the individual components to respect this protocol and not try to do things faster than any device that it communicates with. The result would be that without parallel processing, the overall computing device would not function any faster than the slowest component. It worked for what I was doing in my class... but I'm not entirely sure how well it would scale to something many hundreds of thousands of times more complex. But I have no doubt that someone's come up with something that works if I could come up with that.
Re:Clockless chips and turing machines? (Score:2)
A turing machine deals in discrete steps, but has no requirement for a constant clock.