Forgot your password?
typodupeerror
Hardware

Clockless Computing? 225

Posted by timothy
from the proc.-detected:-P9-"around-eightish" dept.
ContinuousPark writes: "Ivan Sutherland, father of computing graphics, has been for the last ten years designing chips that don't use a clock. He's been proposing a method called asyncronous logic where there's no clock signal being distributed and regulating every part of the chip. The article doesn't give many technical details (greatly needed) but Sutherland, now doing research for Sun, is telling that significant breakthroughs have been made recently to make this technology viable for mass production. It is estimated that 15% of a chip's circuitry is dedicated to distributing the clock signal and as much as 20% percent of the power is consumed by the clock. This is indeed intriguing; what unit will replace the familiar megahertz?"
This discussion has been archived. No new comments can be posted.

Clockless Computing?

Comments Filter:
  • The hyper-marketing-droids ay mega-chip-corporation wont believe that consumers will be able to handle not comparing chips by Mhz/Ghz raitngs, and you will see what they did to CD-ROM drives. 12.5Ghz Max!
  • These techniques have been known for quite a while, but they are quite viable. A few points of note:
    • The major commercial excitement about asynchronous hardware derives from the facts that it has much lower electromagnetic emissions (no great big synchronous clock pulse) and it can stop and start on a dime (HALT instructions are cheap and really do save power.) These make asynch. hardware very interesting for people working in embedded and portable environments.
    • Amulet3i [man.ac.uk] is a fully asynchronous processor core which provides a standard programming interface by being machine-code compatible with the ARM - you don't notice that it is asynchronous except when you do crazy things like varying the power from 1V to 5V, or by popping a load of dry ice on it, and even then the chip "takes a licking and keeps on ticking!" The speed of the ticks just changes a bit.
    • Experience with Amulet at Manchester indicates that asynchronous logic is competitive with synchronous logic of the same generation in terms of speed, though it can be designed by a much smaller team of engineers.
    • The hardest part of it all is definitely tools since there are a lot of very good tools out there for synchronous development. I'm working on this, but it doesn't happen overnight!
    • There are also a lot of different design methodologies around, so when people say asynchronous, they are really referring to a whole bunch of different technologies as opposed to the single tech that is synchronous design. Some of these technologies scale much better than others. For example, petri-nets are very good at very low-level design, but they don't scale up very well and are a pig to actually use. OTOH, micropipelines (which is what Sutherland is working with, though he's looking more at the low-level end of them) are much less close to optimal at the transistor level (i.e. it takes more effort to squeeze out the last drop of efficiency) but they scale up to whole CPUs and even entire computers much better. Speaking as someone who's designed toy processors that way. (OK, so I've not the patience to go through and implement a whole pre-defined instruction set - I only did it one week as a proof-of-concept. :^)
    • It is far easier to compose micropipelines with each other since you have no setup-and-hold timing nonsense. Those restrictions which do exist tend to be pretty easy to spot, and we've developed tools for automatically checking for the main async. design bloopers.
    • Did you know that during the design phase for the Amulet3i, benchmarking was done by running real operating systems (RiscOS IIRC; nowadays we'd use ArmLinux instead) and applications (ghostscript, rendering the tiger picture) on it in simulation? :^)
  • Logic synthesis tools are very important to modern day IC design and optimization.

    That is why Theseus Logic, Inc. [theseus.com] (mentioned about 2/3rds down in the NYTimes article) has a Strategic Alliance with Synopsys [theseus.com]. Our patented NCL (Null Convention Logic) [theseus.com] technology, unlike many other asynchronous technologies, is designed for maximum interoperability with existing tools, maximum design reuse and near-complete elimination of common CBL (clocked boolean logic) timing closure issues.

    For those that mentioned Amulet, its project leader, as well as original ARM designer, Steve Furber is on Theseus' Advisory Board [theseus.com].

    Please visit our web site [theseus.com] for more information.

    Disclaimer: I am an employee of Theseus Logic, Inc., who is NOT speaking on behalf of Theseus Logic in this post, nor his its content been approved by any Theseus Logic official.

    -- Bryan "TheBS" Smith

  • Even without a set clock cycle, any CPU must have some sort of regulatory system which coordinates the execution of instructions (this is, of course, the primary function of the system clock). Without such a system, all parts of the CPU could execute instructions at random, making performance-improving techniques such as pipelining useless. So where would the regulatory circuitry be on such a chip? Surely adding it to the CPU itself would counteract the supposed gains from ditching the cock.
  • Imagine your local server room. Don't you think they would like a 20% decrese in their power bill?

    We can afford the power bill easily enough. What we can't deal with is the extra cooling.

    Does anyone know roughly:

    • How much extra power each watt requires, to cool it ?
    • Costs per watt in typical excess costs for dealing with extra power generated in machine rooms ?
  • cooling the chip will automagically speed it up.

    asynch chips go as fast as the hardware can when the software needs it
  • Technically you don't know how long it takes on a regular microprocessor because of out of order execution and multiple issues per cycles. And on an async processor, i'd imagine that each instruction has an average latency. That way, you'd know about what should happen.
  • You would measure this form of speed with names like

    Intel Pentium V Fast
    Intel Pentium V Really Fast
    Intel Pentium V Yeah we know this one costs the same as the fast one last year but it is so much faster.
    AMD Thunderbird Oh my god did you see how fast that was
    AMD Thunderbird Seriously ya'll this is quick

    IRNI
  • if you wire the asynchronous machine well. I'm taking undergrad Computer Engineering at U of T, and we have a course which introduces asynchronous finite state machine design.

    Although it's much harder to design such a machine, it is not impossible, and it acts the same way as a clocked machine, except there is no clock to interface you with it, so you would have to have it output a simulated "clock" that would give you the information on when it's ready for the next instruction.

    This clock wouldn't really be a clock, as you might expect. It would give you an edge when the AFSM is ready, so the period would vary from instruction to instruction.. This isn't such a problem though, since it's the clock that dictates the pace anyway..

    If any of you are interested in more on this, check out our text, "Fundamentals of Digital Logic with VHDL Design" by S. Brown. Chapter 9 introduces the concepts.

    Janimal
  • You forgot the all-important CowboyNeal. I don't know about you, but I wouldn't buy a computer with less than 50 giga-CowboyNeals of processing power.
  • Two things: A friend of mine was taking some ee classes at cornell on chip design, and he said he attended an optinal lecture (not related to his class, but his professor suggested that everybody go anyhow), and the person giving the lecture had built an asynchronous MIPS chip... That's cool =:-)

    The second thing: I'd hope he'd have made some advances in programmability in the mean time...
  • by Cenotaph (68736) on Monday March 05, 2001 @05:42AM (#384382)
    Along with the comment below about these problems being moved to the compiler/assembler writers, I'd like to add that you can have a machine that is very much like a dataflow machine, but uses conventional instructions. It's been done at Sun labs and is called the CounterFlow Pipeline Processor (CFPP). The original paper that proposed it, coauthored my Sutherland, can be found here [sun.com] in PDF and PS formats. I did a presentation on this architecture for a class a few years ago. If you're interested, the slides for that presentation can be found here [rit.edu] in PowerPoint format. There was also a research group at Oregon State, but their web page is MIA.

    So, what is a CFPP? It is a processor with a pipeline where data and instructions flow in opposite directions, with the instructions usually thought of as moving "up" and data as moving "down". The functional units (FU) are attached as sidings to the main pipeline. Each FU launches from a single pipeline stage and writes its results to a different stage, further "up" the pipeline. The main goals of this architecture were to make the processor simple and regular enough to create a correctness proof and to achieve purely local control.

    If Sun ever produces a processor that is asynchronous, it will likely look similar to this.
    --
    "You can put a man through school,
    But you cannot make him think."

  • I wouldn't want the manufacturer optimizing for that over other, useful things.

    You mean, like the way they optimize for MHz over other, useful things, like flops? Remember when AMD did that little ad campaign of "Our 800 MHz chip is faster than Intel's 766 MHz chip!" How many "normal" people followed that one? Today, MHz is the standard rating of speed, and is misleading. mflops would be a much better measure (although you're right that, with different ops taking different amounts of time, you'd have to carefully define what you mean by an operation).

    Secondly, I don't think it will take "several years" of experimentation to figure out how much faster your add is than your multiply. We already know the answer to that question, and it depends on how you decide to impliment your circuit. If you decide to do multiplication with shift/add you could get a tiny little multiplier that's freaking slow, or you can go hog-wild with 7,3 counters, wallace trees, fast adders, etc. etc., and have a gigantic circuit that's really fast, but that's how hardware design has always worked and the options for solutions will be unchanged. Now though, you have a few more choices to make since your ops don't all have to fit into equal length pipeline stages, and also each op doesn't have to take the same amount of time for each set of inputs (for example, 7 + 1 might take x gate-dealays of time, whereas 7 + 0 could take many less.)

    It's all very exciting.

    God does not play dice with the universe. Albert Einstein

  • If asynchronous chips became popular, it might help further debate on the difficult question of what makes "fast" fast. The common number, MHz, is of course pretty meaningless -- sort of like measuring the speed of a car by the RPMs. (If you've got the same model of car in the same gear, it's a meaningful comparison. But is a G4 at 500 MHz exactly five times as fast as a 486 at 100? What does "five times as fast" mean anyway?)

    Making the familiar measurement meaningless might put more emphasis on benchmarks, and give more impetus for getting them better standardized and more meaningful. ...Or not -- there are obvious and equally meaningless alternatives for asynchronous chips, like FLOPS, or LOPS (Logical Ops Per Sec).
  • The point most people miss about chip design in general is that whatever the methodology being discussed, modern chip designs would be utterly infeasable without the CAD software that goes with them. The complexity of any IC design both synchronous or asynchronous is utterly beyond any manual design methods at this point, and the main reason that synchronous design predominates today is that CAD tools for synchronous synthesis came to market (Synopsys in particular ) and have dominated the field for nearly 10 years now. However research on async CAD tools continues, one notable effort being the European OMI project EXACT (http://www.omimo.be/system/templates/OMIProjects_ Detail.cfm?ID=6&Project=6143) which yielded a commercial error correction chip used by Philips in DCC players. If groups such as the AMULET group can automate there methodolgies then async design can quickly gain ground in various power and performance sensitive niches.
  • This sounds like the kinds of problems that the concurrent functional programming people love. See Erlang [erlang.org] or perhaps some concurrent variation of Haskell [haskell.org] like Eden [uni-marburg.de].

    Regards,

    Zooko

  • Try to count seconds _asynchronously_ with your heart... It's not easy, I've tried :-)

    - Steeltoe
  • by Stipe (35684)
    > [can measure how many operations it can do per second]

    Yes, but the point is that even on the same processor may take a different amount of time to do the same operation albeit on different data.
  • There are structures in your brain that basically function as a 24 hour clock, and some people can use them so effectively that they can tell time. This "clock" is not distributed to every functional component though. Ditto in your hearing example.

    Asyncronous computers will have a timing clock, just not a clock signal that controls the gates on each functional component. This is the difference between attaching a few thousand parts to the clock and attaching 37 million+.
  • by Alien54 (180860) on Monday March 05, 2001 @04:54AM (#384390) Journal
    As I recall, this story has been around for a few years. But this does not make it less relevant.

    What makes it interesting is that you have to fundamentally redesign your your whole logical design so that you have a general purpose design.

    With clocked computing, it is easy to see how you would flush buffers, etc. Clockless computing would be more problematic, and of course, would probably be proprietary.

    My initial reactions are that it would work easiset in things like embedded processing. I also wonder if there would have to be some sort of evolution similar to what we have seen over the past few years with Intel, Motorola, etc.

    One must not forget that the increases in performance for an awfull lot of these chips has to do with clock speed increases, as well as code designed to take advantadge of certain coding features in the hardware.

    an early example of this is when the Pentiums first came out. For a while you had 486 boxes and pentiums with the same clock speeds on the market. you could compare performance between systems with the same video cards, same ram, same cache, etc. even though the chip sets with not the same, etc. This was educational. As I recall, the performance boost for somewhere not taking advantadge of the pentium feature set was aboput 20 - 25% (?) I may have this wrong, of course.

    But at a time when pentium systems cost twice of a 486, it was definitely buying for the future.

  • How come one Amulet post is modded "3 informative", and this one is modded "3 offtopic"???

    I'd have thought that a post about a commercially available async. processor and the benefits of async. design are rather "on topic" for a story on an async CPU... particularly when the story claims this is something new when the idea itself is decades old, and Amulet itself (async. ARM, designed by Steve Furber, the original ARM architect, now a professor at Manchester University) has been around for quite a while.

    As this post points out, not only is async. (i.e. data driven) design good for low power (you only use power when doing something!), but it also promises to raise performance by allowing each part of the chip to independently run as fast as it is able, and compute results as soon as it's operands are available rather than waiting for the next clock.
  • The obvious and most practically useful approach to speed grading async. CPUs would be to bin them into lots that meet a set of minimum performance standards.

    I somewhat diagree about where speed grades would be an issue. The obvious market for async. CPUs like Amulet is in handheld consumer devices where precise performance characteristics don't matter (hence the acceptability of conventional power management techniques), but power consuption does.

    For embedded real-time applications, however, you need repeatability more than power savings or peak performance.
  • Well "Ryan", what exactly makes you say Axel's post is fabricated? Are you saying that IRC log he linked to on Slashnet is a fabrication? That the "troll cabal" somehow hacked into the Slashnet servers and placed it there, and the Slashnet administrators haven't noticed it yet?

    And yes, it's true, Michael stole the Signal 11 account from the person who was using it. Shortly after it happened, there was a discussion about it on one of the front page stories of the day, in which Michael participated and basically admitted what he did (he acted like there was nothing wrong with it). He then promptly modded down all the posts in that thread (including his own) to -1 so they wouldn't get archived. Anybody who was paying attention at the time will remember what I'm talking about.

    Axel backed up his statements with references wherever possible, and you merely assert, with no evidence, that he's lying.

    Folks, what Axel said is true. Ignore "Ryan"; he's just trying to confuse the issue.

  • The Commodore Amiga 3000 and 4000 actually feature a 32bit "clockless bus" called Zorro III which overlays ontop of a clocked bus, similar to ISA, called Zorro II. The Zorro III bus is a very interesting bus. All timing is strobe based. This means data is thrown out on the bus, and the receiving card or bus controller strobes a reply back as soon as it can latch the data and make use of it. Transfers between cards and memory can be extremely fast, however, transfering from a "fast" card to a card with slower logic will actually slow the system down. Addressing is multiplexed, like PCI ... its a very cool thing though, imagine if PCI was transparently overlayed onto ISA and thats Zorro III! All data transfers were done between the 3.5MHz clock cycles of the Zorro II bus. Of course, the technology has been lost over the years -- it would be cool if it was used again. clock sucks!
  • ... Another concept of interest on this topic is Null Convention Logic. Here is a report to the NSF [ucf.edu] who apparently paid for some level of research on this.
  • I'm not sure if it was DEC or DG, or one of the other old Minicomputer manufacturers, but I seem to recall that something like a PDP-11 class machine was implemented without a CPU clock. Any old timers able to fill me in, here? -jcr
  • Yeah, but that clock is like the BIOS or hardware clock, not like the CPU clock. It times large scale activity in the brain, but not second by second activity. I remember reading about clocks that governed second by second activity that could be read in brainwaves.

  • ...but only when it needs to synchronize with those components.
  • Is ten years of research really worth a 20% decrease in power consumption and a 15% decrease in overall chip size? I can't see how it could be. Chances are, by the time this technology is ready for prime-time (if ever), chips will be utilizing vastly different technology than they are now.

    It's becomming increasingly harder to shrink chip sizes and increase speeds. Even with using different metals such as copper and shrinking trace widths, we are eventually going to hit a brick wall with current technology. After doing so, taking away 15% of the chip complexity is not going to go far in creating the next generation faster chip.

    It's time to look to new technologies: carbon nanotubules and buckyballs, quantum computing, etc.

  • It makes good sense. Some operations take longer than others. So adjust the clock period to suit each instruction. Only works if you execute one instruction at a time. Otherwise I can't see why you are puzzled. If, as at present, you run at a fixed clock speed, some instructions are ready before others, which is inefficient. Of course in the real world, it makes sense to keep the clock speed fixed, which is my worry about asynchronous designs; sounds good on paper, doesn't translate into zillion gate designs.
  • Have you been peeking at my source?
  • You're right and yes, I have. To 288 packages, each one an ASIC, over a volume of about one cubic foot. I more or less gave up doing hardware after that!

  • I'm just taking low level design for my BS in CS so i'm no expert, but it seems to me that... All the clock does is tell the CPU when to execute the next instruction. Well, the voltage change also drives the execution, but the voltage change takes place when the clock tells it to. What if the next voltage change took place when the last instruction said it was done. Why wait 1/800 x 10^6 seconds for a register load which is done in less than 1/3 that time (and register loads are VERY common). If the last instruction executed signaled the Program Counter to fetch the next instruction when it was done....voila! faster computing!
  • Seems to me that system performance should be be measured by real-world benchmarks... ...perhaps using Q3 as the de-facto standard...

    >TIMEDEMO 1

    Well, that's all most people seem to care about these days anyway.

    "Hey man, I get 143 frames per second in Quake 3! 147 if I overclock!"

    "Everything you know is wrong. (And stupid.)"
  • The last time I checked, there wasn't any sort of high frequency clock signal running down my spine.

    I can introduce you to one of several cousins who generally have the effect of sending high-freqency signals not only along your spine, but along nerves you never knew you had before - if the ``mike'' in your email address does stand for michael. Youngest candidate is about 15, oldest is about 30. Warning: they're more likely to stop your clock than start it, if the old ticker isn't in good shape or the old blood supply is a bit lean... (-:
  • somebody's gotta be capable of coming up with a non biased benchmark to give theoretical average operations / sec.

    Hm. How about the time it takes to compile the generic NetBSD kernel? Since NetBSD runs on almost anything for which the speed is interesting, and compiling involves a nice mix of real-world operations, execpt for floating-point.

    (Yeah, I'm joking... a little bit :)

  • Most of the times wins. From what people are saying is that asynchronous logic is much harder to create. Or is just that everyone was taught more about clock-based circuit than asynchronous logic circuits. Well I will personaly like to see how this will play out.
  • by Anonymous Coward
    'pretsnsious signatures on dork postings will be moderated as such'
  • Chuck Moore (the guy who invented Forth way back when) has built several minimalistic chips which have a lot of asynchrony. The way they do it is very different form dataflow -- instead of having functional units notify each other when they're ready, the hardware is generally designed to /assume/ readiness, and in fact to /be/ ready.

    For example, in traditional processor design the first stage of execution is decoding, and the second is register lookup; in Chuck's chip there is no register lookup, because he uses a stack. Because of this, instruction decoding proceeds in parallel with all the ALU computations and result accesses, and when the instruction is decoded the result is simply to gate a result into the TopOfStack register (and sometimes to pop the stack, if the instruction was supposed to consume two values).

    The only exception in the current design is the ADD instruction, which he's implemented as a ripple-carry; that can sometimes take more than one cycle to compute, so if there's a possibility of a large carry the programmer is responsible to insert up to 3 NOPs.

    The URL is http://www.ultratechnology.com.

    In my microprocessor design class a few years back, I built my processor around these concepts. It was an unqualified success; it was easy to build, easy to program, much less resource-constrained than any of the other designs in the class, and ran all the required programs much faster than anything else. Almost everyone else was following the party line and building a RISC-style two-operand machine; since we only had eight bits per instruction, this was suicidal. The few who weren't completely toeing the line were building accumulator machines, which worked well but didn't have the sheer flexibility.

    -Billy
  • The biggest point is that this will save enormous amounts of electrical power that can be used elsewhere.

    Sutherland's work is nothing new. The Computer Science department at the (a department he founded) has been working on this for years as well. They have made significant progress. [utah.edu]

    I know, I worked with one of the professors for a while before I went into electrical engineering.

    Just 'cuase the British government can't make it work does not mean it is impossible. Many inventions we use every day were considered "impossible".

    Remember that since you don't know about the research, perhaps there is something you don't know.

  • At the end of the day, MIPS and FLOPS and all the other so-called performance numbers are pretty much meaningless anyway, since no one factor governs the overall performance.
    Seems to me that system performance should be be measured by real-world benchmarks that most people can relate to and that hold much more importance to the user, such as:

    SBT - System Boot-up time, where the value is normalized modulo the time it takes to get coffee
    GAFR - Graphics Accelerator Frame rate, perhaps using Q3 as the de-facto standard
    and the related:
    MTBF - mean time between frags

    of course, we should also measure things like:
    MTBR - Mean Time Between Reboots
    MTBNR - Mean Time Between Network Redials

    ... but these are questionable since they can affected by outside influences unrelated to system performance :)

  • That's the whole point of many years of resarch on the subject. This is nothing new.

    The CPU hits bottlenecks in slow components now. The difference is that right now everything keeps running at full power during a bottleneck while it does nothing at all. With async design, when components are waiting they take a fraction of their power as there is not clock cycling them off-and-on.

    Also, the layout and design of a synchronous clock is a major limiting factor in CPU design.

  • by Zordok (90071) <doug&zordok,net> on Monday March 05, 2001 @04:35AM (#384419) Homepage
    FLOPS, of course.
    Even without a processor clock, you should still be able to measure how many operations it can do per (real-time) second.
  • by yakfacts (201409) on Monday March 05, 2001 @07:10AM (#384421)

    It is both amusing and frustrating to hear all of the "armchair computer scientists" discussing the reasons this technology is a bad idea. As if they knew more about the subject than the many PhD's who have dedicated their careers to this subject based on the knowledge gleamed from the one Computer Architecture class the poster took as an undergraduate.

    I was invited to work on a team at the University of Utah (Sutherland's old school) where they were researching this very topic. This is old news; they have been working on it for years. And as some people have correctly pointed out, there are both good and bad points to sync or async logic.

    There are two major reasons to work on async logic: clock skew and power savings. The reason for power savings alone is a good one. People here have been complaining that it "is not worth it for only a 20% power savings".

    Yes it is! In a modern office, computers end up taking a lot of power. Imagine your local server room. Don't you think they would like a 20% decrese in their power bill?

    That means instead of building five power plants, you only need four (on a grand scale; please no newbie replies like doodz, thiz guy thinkz you n33d five pawer pl3nts to run a box). That is significant. And with today's high-MHz CPUs this means even more. Some think >50% savings, and even more during low cycle time.

    The clock skew issue has been covered somewhat here. One of the major hurdles in solving the design problem is the development of new design tools, which is what many people at Utah are currently working on.

    The way to move forward is not to argue for the limitations of systems of the past. Don't make me pull out Ken Olson quotes here.

  • by Pseudonym (62607) on Monday March 05, 2001 @05:52AM (#384426)

    Well, it's not exactly a dataflow machine, anyway.

    The old E&S machines were dataflow architectures at the equivalent of the "machine code" level. Newer architectures are using similar ideas, but in a way that does not require details of the dataflow model leeching outside the chip.

    Look at the Pentium 3, for example. It exploits dataflow ideas at the microcode level by prefetching several machine code instructions, splitting them into a larger set of "micro-instructions" and then schedules them together. That's not really a dataflow architecture, but it does use ideas from it: the idea of deciding on how to schedule the instructions at run-time.

    The new clockless CPUs will exploit dataflow ideas by implementing a kind of dataflow machine between the functional units of the CPU itself. The CPU, remember, is like an interpreter for machine code. Since the "program" for that interpreter does not change, it can be implemented in a "plugboard" kind of way and people or programs producing machine code will never know the difference, apart from speed.

  • This design paradigm and sets of tools all assume synchronous logic.

    That is a significant issue. Because of that, I'm not betting that my next CPU will be async. The first applications will probably be in the micro controler area where the chips tend to be simpler and less powerful, and where constraints on power consumption are tighter and more important. However, just as the industry made the transition from hand drawn schematics w/ discreet componants to ASICs, this too shall come to pass.

    Imagine an automotive assembly line where things could only move forward if each station got permission from his adjacent stations.

    To some extent, that's EXACTLY what happens in many line processes. To compensate, there are 'buffers' built into the system. Like with clocked logic, each step is designed to take about the same amount of time to complete so that there won't be pileups or starvation at any given station. Consider that in many line processes, some stations, especially at the input and output areas are human operated, and humans are very asynchronous, yet the line proceeds in an orderly manner overall.

    In the case of CPUs, I imagine a RISC approach will probably be used where the instruction set is designed so that each instruction will take roughly the same amount of time (generally true today as well, but enforced by the clock). The compiler will be responsable for scheduling the instructions to avoid starvation and pileup. In cases where multiple identical units exist, there may even be routing bits in the instructions to choose which unit is employed. That's not a very big stretch from the current situation since good optimizing compilers already have to know about that process but don't get to make the choices.

  • by julesh (229690)
    But if it doesn't have a clock, how do you overclock it?
    </sarcastic>
  • UK egghead Sir Clive Sinclair [emugames.com] wanted to produce a clockless computer in the late 80's, I recall. It never came to anything, because his business acumen was always somewhat lacking.
  • by Anonymous Coward
    there're not many ways. There's no way to sustain a global clock/time over a whole chip with their future size as they're growing today. The skew is killing us.

    Either we have to have separately clocked parts in smaller domains or we have to go asynchronous. Both are insanely difficult, but the latter have the possibility of generating speeds unheard of. There are transistors capapble to 250 GHz (not in CMOS Si-technology, but anyway) and with some reduction in the feedback, a back-fed inverter could generate 50-100 GHz, locally. Imagine small parts of a chip operating at that speed and using level-triggered handshaking... diffucult, but mindblowing. :-)

    Another thigs, we would get rid of the power consumption. The CMOS is consuming power proportional to the frequency even if they are not doing anything. (At least the clocked parts) The asynchronous logic would not waste any charge in the on-off switches.... Some real power saving!

    The next step is adiabatic calculations. After the logic has reached the result, the process is reversed with not energy- or charge loss.

    However, the quantum computers will not happen during my life time. If ever.

  • There is actually a sort of overall clock in the mind. I remember reading that brainwaves seem to be a kind of general clock signal that seems to be used for coordinating certain activities of conciousness. Sorry to be so vague. The article was awhile ago, and in print. I think in Scientific American.

    That being said, I think you're right, the brain is largely asynchronous.

    BTW, as a shameless plug, my StreamModule System [omnifarious.org] is also largely asynchronous. It's for IPC though, not for gate-logic.

  • by rocketpilot (203235) on Monday March 05, 2001 @09:43AM (#384439)
    I'll preface my comments by saying that my MS thesis was on the development of an asynchronous IEEE single precision format floating point unit. I performed the design in a data flow format; using a simulator created in SmallTalk and then a compiler I wrote that generated Compass format netlists for place and route. This was finished back in '96 and a fellow student did an ARM implementation of the same technology, although he didn't actually get to place and route.

    I'm also an old fart and not some software geek to whom every hardware technology mentioned is something unheard of before. That being said...

    The first computing machines weren't synchronous. I forget the names, but this kind of thing was being done way back when because it was impractical to distribute a common clock across the racks and racks of equipment that made up a CPU back then.

    Also, Motorola's PowerPC chips implement an asynchronous divider, so you might be using asynchronous technology right now.

    The idea of having a computer run as fast as the transistors can go is a great goal, but there's some impractical aspects to the use of asynchronous circuits.

    First, how do you know your computation is done? Well, there's several different ways of telling. You can use a current sensor to decide when your gates have settled out for a decent length of time or you can wait a predetermined amount of time based on worst case. All solutions involve bloating the design with more transistors to time the handshaking between Muller-C elements. Whether it's some type of current sensor or just inverter chains, there's at least 10% of a circuit tied up in timing (and it can run much, much higher).

    Also, what do you do with the data once you've processed it so fast? The IOBs are only so quick in driving pins, so while the core of the design can run really stinkin' fast asynchronously, it's hampered by the ability to get data in and out.

    Design verification is also a nightmare with asynchronous logic. It's a hard enough problem figuring out my longest path between registers across process and temperature variations, but to add in the factor of not knowing your clock is... well, icky.

    Finally, what about noise in an asynchronous design? For my current work, I have to make sure everything happens synchronously... or I end up with nasty noise in my CCD section. I can tolerate a little bit of asynchronous behaviour, but not a lot.

    Where asynchronous technology makes sense now is something like Motorola's divider circuit. By making it asynchronous, they gain the speed advantage of not having to rely on a slower, global clock distribution network, by making it a local function, they avoid the problem of slow IO, and by using it for a "small" amount of their design, they avoid die bloat and noise problems.

    I guess the idea of asynchronous design boils down to one of history. If it's such a wonderful thing and has been around for so long, why doesn't everybody do it? Well, because it has drawbacks and the design philosophy rarely fits the design criteria (cost, tools, reliability, performance, and function).

    I don't think this is a newsworthy item. In asynchronous design, it's pretty much ALL old hat. Academic papers recycle the same ideas and the UK email reflector for asynchronous "researchers" goes quiet for months at a time.

    Maybe tomorrow, /. will report the discovery of fire.

  • by MouseR (3264) on Monday March 05, 2001 @07:19AM (#384447) Homepage
    This is indeed intriguing; what unit will replace the familiar megahertz?

    Given the absence of a clock, Id go for Inhertzia.

    Karma karma karma karma karmeleon: it comes and goes, it comes and goes.
  • How about an equally meaningless number, like BogoMIPS?

    --
  • The first electronics where built without clocks. Most communications don't use clocks. your 100/t network doesn't have a clock to operate. It works by senging data (A high or a low) then a middle state. This is a voltage in between the zero and one states. On the recieving end it waits for the state changes. Once it recieves that middle state it knows that the next change is acually data. Once it recieves the data it waits for that invalid state to know that that bit has ended.
    The reason it has to do this is because both systems may be running at diferent speeds. If one's running at 100MHz and the other is at 102MHz then they will eventually get out of sync. Without a system like this they wouldn't even know they where out of sync. DSL/Cable/Modems all use systems like this. There are several hundred out there. Some of the mor common are Return to Zero (RZ) Non return to zero (NRZ) etc etc etc..

    Back to electronics though, several types of memory are acually async. You set your addressing pins, you pulse a pin high wich tells the chip that its address is waiting. The chip then looks up the memory and sets the output data pins and then sets a return pin wich allows the other electronics to use the data returned. This allows for the fastest operation of the chip as posable. Since there is no clock there is no speed to conform to. The chip will always return its fastest posable responce. This is also harder to work with since you have to rate your chips in a speed rate. You may remember this from the old 70ns/60ns/50ns EDO/SIMM memory erra.
  • by Steeltoe (98226) on Monday March 05, 2001 @05:54AM (#384456) Homepage
    The story is about asynchronous computing, not about clocks in general. Asynchronous computing is to synchronous computing as functional programming is to imperative programming. Sure you may have methods of synchronizing with external entities, but the internal processes are (mainly) asynchronous.

    The brain is an excellent example of parallell asynchronous computing, since a neuron will only fire when its input-treshold has been reached. However, many internal processes in the brain may in fact be more or less synchronous, due to the fact that maybe it's an evolutionary advantage :-) So the basic idea is that a neuron is asynchronous in principle, but groups of them may find it easier to communicate synchronously.

    - Steeltoe
  • Raise the voltage and, as the other guy said, cool the chip. Asynchronous logic is usually self-timed, so if your gates are faster, so is your overall speed. Time to change the name "overclocking" to one of it's more proper names such as "speed margining".
    --
  • by Thagg (9904) <thadbeier@gmail.com> on Monday March 05, 2001 @05:03AM (#384461) Journal
    From what little I could glean from the NY Times article, this sounds like a dataflow machine; that is, a machine when the various units 'fire' when all of their inputs are present. The idea is that each functional unit of the machine could be running in parallel, asynchronously, without any of the complexity that EPIC, say, imposes.

    Unfortunately for Sutherland, there's something called the PS300.

    Back in the late 70's and early 80's, his company, Evans and Sutherland, ruled the world of computer graphics with their very slick Picture System machines. These were peripherals to PDP-11s and VAXes, and were wonderfully programmable machines. There was a fast interface between host memory and Picture System memory; letting you mess with the bits to your heart's content. We had a couple of them at NYIT's computer graphics lab; and did a lot of great animation with them.

    E&S's next machine, though, was the PS300. This was a far more powerful machine, its first machine with a raster display. It was an advance in every way, except that it imposed a dataflow paradigm on programming the machine. You could only write programs by wiring up functional units. It was astonishingly difficult to write useful programs using this technology. Everybody I know that tried (and this was the early 80s, when people were used to having to work very hard to get anything on the screen at all); every one, gave up in frustration and disgust.

    ILM got the most out of the machine; but that was by imposing their will on E&S to provide them with a fast direct link to the PS300's internal frame buffer.

    Basically, dataflow ideas killed the PS300, which destroyed the advantage that E&S had as the pioneer graphics company, and they have never recovered from it. While the idea is charming, and to a hardware engineer it makes a lot of sense, programming them takes you back to the plugboard era of the very first WW-II machines. Nobody wants to do that.

    thad

  • So this is similar to CDROM data and other serial data that is "self-timing"? Do you have any more in depth articles or whitepapers to back this up?

    ----
  • Two things:
    1) not all asynch-logic is dataflow. Dataflow is only the best known version.

    2) There is (was) at least one decent dataflow language. Prograph. It's true, that it isn't a good fit to the machine it was targeted at (the Mac), and they didn't come up with a decent text-based printout, etc. (This last did make development hard... if you can't see a piece of code, it's hard to debug it. OTOH, they had an excellent visual debugger that could step through the code.)

    The real problem with a new paradigm is convincing people to use it. For this one needs to find an entering wedge. Perhaps CPU design will be it for asynch logic (whether dataflow or not). Once it becomes established in a field, then it will have a chance to develop.

    Please consider that asynch logic could be just what is needed to allow multi-processor machines to become useful. My real suspicion is that it will eventually end up as a mixed system, with certain pieces synch logic driven (e.g., screen drawing logic, sound generation, etc.) and other parts asynch. But that's probably a few decades away. (Perhaps.)


    Caution: Now approaching the (technological) singularity.
  • by clintkelly (208016) on Monday March 05, 2001 @10:18AM (#384472)
    Our group at Cornell University works on asynchronous design. My advisor built an asynchronous MIPS processor at Caltech a couple of years ago. It works, and it is extremely energy-efficient (better than pretty much anything in existence for the same process). We use a different design methodology than Sutherland's group, and none of the criticisms posted here about asynchronous design apply to us (for example, all of our circuits -- including full CPU's -- have been formally proved to be 100% correct).

    If anyone's interested, our group's page is:
    http://vlsi.cornell.edu/

    Anyone who wants a good overview of asynchronous design should read this paper:
    http://vlsi.cornell.edu/~rajit/abstracts/async-c as e.html

    ck
  • by mOdQuArK! (87332) on Monday March 05, 2001 @10:19AM (#384474)
    A fully asynchronous design requires lots of ready signals, or some very careful time-of-flight constraints. Aside from the fact that the current popular logic-synthesis tools don't provide neatly packaged solutions for this kind of design, if you don't implement this stuff in an intelligent manner, you can easily create a design which completely destroys any advantage over a synchronous design in speed, power, reliability and/or area.

    On the other hand, one more advantage that I haven't seen mentioned about asynchronous design is modularity - most synchronous designs can only be verified for correctness in the context of the global clock signal, whereas if you've verified the correctness of an asynchronous module, you can plug it in wherever its functionality fits, without having to adjust all the stuff around it.

    When you think about it, however, you will note that synchronous design is actually just a SUBSET of asynchronous design - the clock signals are just a way of indicating a "data ready" condition to the next bunch of logic gates. Careful logic designers who hold this viewpoint can design hybrid synchronous/asynchronous designs, where the overall design is actually a bunch of smaller synchronous designs, where each block of synchronous logic receives a "clock" which is actually a data ready signal for the logic block as a whole.
  • Performance measurement should be done by a test suite anyway. This "clockless" machine will only emphasize that. (They could, perhaps, call it the infinity chip .. the clock is a piece of wire, so it cycles "an infinite" number of times per second :-)


    Caution: Now approaching the (technological) singularity.
  • People been worying about this since the 1980s.
    Speeds get faster; chip dies get larger;
    far of units get out of sync.

  • by stevew (4845) on Monday March 05, 2001 @06:04AM (#384483) Journal
    The problem with Mips:

    Not all Mips are created equal. For example: is it fair and reasonable to compare a CISC Mips to a RISC Mips? The CISC may be doing something like a string move with one instruction while the RISC machine does it with series of instructions in a loop. Obviously this is an apples an oranges comparison.

    Okay - next you look at Flops - aren't Flops the same on every machine. Well - no, though that is probably less of an issue for comparing IEEE based implementations. The question comes up (and it has already been mentioned) that Flops don't compare useful work loads! The vast majority of computer work loads don't involve significant floating point operations. (Yes you can find workloads where that is the case - but it isn't the majority situation.)

    So it comes down to comparing computer "systems" is a tricky business. Even Mhz in the same architecture family doesn't work because you don't know how efficiently the machine is designed -the hardware might be capable of greater than one instruction per clock!

    Finally - I don't believe the estimate of upto 15 % or clock distribution. It's more like 1%-2%. ( I do chip design for a living..at least I have an educated opinion on this!) The clocks ARE a significant part of the power issue though. CMOS burns power when signals move. The clock moves. Simple enough analysis there.

    Asynch design methods have been around forever, but present a number of problems for traditional design tools that depend on the clock to do their work. Further, there are alot of chip designers that throw up their hands if you just mention the word "asynchronous design" to them. Any push to this kind of design would be tramatic to say the least ;-)

  • by mirko (198274)
    During the last few years, a group has been working upon an asynchronous processor : AMULET [man.ac.uk].
    This CPU uses the ARM [arm.com] core.
    It is so power-efficient that it could only rely on the induction power resulting from its pins transmitting information.
    Current status specify delivery of the AMULET3i.
    --
  • My initial reactions are that it would work easiset in things like embedded processing. I also wonder if there would have to be some sort of evolution similar to what we have seen over the past few years with Intel, Motorola, etc.

    An added thought to this is that since, according to the article, a lot of the research is being done of the Sun side, this will have interesting implications for the Wintel crowd.

    It seems that it would make its' way into the market first via the UNIX crowd. This makes for interesting opportunities. The last two paragraphs of the article are interesting in this regard:

    Mr. Sutherland, in fact, says a new magic is precisely what he has found. He draws an analogy to the first steel bridges, which were built like stone bridges, with arches. It took some time, he said, for designers to recognize the possibilities of the suspension bridge -- a form impossible to create with stone alone but which was perfectly suited to the properties of steel.

    The same is true with asynchronous logic, he said. His research shows that it will be possible to double the switching speed of conventional clock-based circuits, he said, and he is confident that Sun in particular will soon begin to take advantage of that speed. "A 2X increase in speed makes a big difference," he said, "particularly if this is the only way to get that fast."

    fascinating.

  • The clock that the OS uses to wake itself up to make scheduling decisions is completely seperate from the clock signal that is distributed on a cpu chip. The clock signal on a chip is what permits data and control signals to advance from one stage of the pipeline to the next, and thats the clock signal that async logic gets rid of. The clock that the OS uses to wake itself is a hardware interrupt from a completely seperate place.
  • The interdata series of computers did not use a clock -- twenty years ago. The logic just "falls through" with the answer and restarts.

    This is not new or groundbreaking.

  • Have you ever tried to debug your brain? It can be fairly difficult.

    ------
  • Back in the 40s or 50s, some mathmatician guy (not J'VonN', although he was somewhat involved?) proved that NNs/digital logic is isomorphic to some sort of logical calculus stuff. (Sorry for the lack of details.) People got excited because, philosophically, they thought that Formal Logic = Thought. Nowadays, most of us would be kinda skeptical of that assumption.

    That sounds like Alan Turing. He utilized an idealized, finite-state machine called a Turing Machine to prove a bunch of interesting theorems about logic. He conceived of the Turing Test by which a machine would prove that it thought by sucessfully convincing a remote human that it was a real person. Today, we probably wouldn't consider that as sufficient evidence of thought.
  • The last time I checked, there wasn't any sort of high frequency clock signal running down my spine.

    That's because your brain is an analog computer, not a digital one. As for "if the brain can do it, it must be possible", that is simply not true. We are still in the dark ages as to what the brain actually does and how it actually does it; and we won't be able to use any of our discoveries in information processing technology for the forseeable future.

    And no, artificial neural networks are NOT analogous to how the brain operates. It is most useful to think of them purely as mathematical creations. They are orders of magnitude simpler than the networks found in the brain; and their operations are at BEST guesswork.

  • I remember from my digital electronics class that in many ways, it makes the most sense to have tri-state (instead of binary) electronics. (In theory, e-state would be better, but since e isn't an integer, that's a little hard.) Of course, the problem is making the logic work, since your signals are low, medium, and high.

    Well, what if the logic worked such that if any input was medium, the output was medium. Otherwise, the output was as it is now with binary logic. Then you could build a CPU that left units that weren't being used in the medium state. When an operation was performed, you would know when it was done as soon as the result didn't have medium bits.

    Likewise, you could push this back on the memory and other subsystems.

    Of course, now the question is whether adding the additional state is worth it in eliminating the clock.
  • What kind of Floating Point Operation? Addition will be faster than multiplications, which will be faster than division. Operations will no longer be tied to the slowest possible operations, so they may not even be even multiples of each other.

    I think in such a system, other features (code optimization, use of 3D accelerators, etc) will be more important than the speed of an add. It will even take several years of experimentation to determine what optimizations to make (how many times is it better to add than multiply, how should loops be unrolled, etc).

    I think many traditional measurements will become worse than useless, and instead misleading. Since a lot of your repetative math operations may be unloaded on your 3D accelerator, it is questionable that, even if you could decide how to measure it, floating-point-operations per seconds would be a real indicator. I wouldn't want the manufacturer optimizing for that over other, useful things.

    A better question is, how long does a NOP last? Won't this system optimize it out? How can you time a NOP without a clock?

  • by hqm (49964) on Monday March 05, 2001 @05:23AM (#384515)
    One problem with aynchronous systems is testing.
    If you have a chip where some of the units are slower than expected, you might get curious interactions and "race conditions" that are
    very hard to test before you put the chip into
    service.

    Also, designing for asychronous logic has
    been difficult - designing clocked and even
    pipelined systems is a breeze compared to
    dealing with asynchronous design. A lot of the
    structured methods that have been developed for
    conventional clocked circuits cannot be used,
    and so designers have a lot of trouble
    building complex systems.
  • To use a software analogy, how easy would it be to debug a program where half of the code consisted of conditional branches ?

    A little bit of a pain, but far from impossible. Anyone who works on software for a multithreaded, multiprocessor, or distributed environment solves asynchrony-related problems all the time. We do it by having locks instead of clocks; hardware folks can do and on occasion have done just the same. I'm sorry to hear that such basically simple problems are considered unsolvable by garden-variety EEs.

  • just design it as a data-flow chip, with functional units propagating answers when they are availible and halting on partial inputs (you can even envisage a system which allows out-of-order execution of ALU operations). The main difficulty is likely to be getting in-order commit to work out.

    appart from that, it is basically an excersise in bookkeeping -- tag all values as belonging to a subinstruction, so that you are able to get the data dependencies right.

    I could go on, but I think you get the idea. However, let me emphasize that the situation of the whole chip waiting on the slowest component is what we AVOID by going asynchonous, as this is exactly the reason why intel needs to pipeline so damn deep to get the clock rate up. They need to split the pipeline into steps small enought that each step can be done in one clock. Asynch circuitry wouldn't have that problem.

  • You could only write programs by wiring up functional units. It was astonishingly difficult to write useful programs using this technology.

    Sadly, much of this was a previously solved problem in the Control Data Corporation 6600 series of computer [geocities.com], which used a "scoreboard" to keep track of dynamically arising opportunities for parallelism among the CPU's functional units. This is generally the technology used in modern CPUs to infer parallelism from look-ahead in machine code instruction streams.

    The fad in "dataflow" machines in the lat 70s and early 80s (arising largely from John Backus's 1977 Turing Award Lecture) was not entirely misguided. However, the failure to come up with a good way of describing I/O and other time-related operations was its downfall. I've been working on this stuff [geocities.com] from the viewpoint of distributed programming environments as a high-priority background task ever since those days, and it gets into some of the most serious philosophical questions about the relationship between mathematics and reality that are intimatedly related to quantum theory and phenomenology. At some point, we have to ask ourselves: 'What is an object, how does it come to "be" in "time" and how can we best formalize these conceptions?'

    I don't think these problems have to be solved entirely for asynchronous systems to work, but my point of departure in trying to come up with a programming language that could handle dataflow was hardware design langauges, and the way they generalize boolean algebra to represent feedback circuits which means "time" in an important sense. Unfortunately, the best guys in Bell Labs at that time who were working on high level hardware design languages were using quite ad hoc formalisms to represent such boolean feedback loops.

  • Is here [nytimes.com] Sorry
  • Come on, the Octium is the Pentium VIII, the Septium would be the Pentium VII ;-)

    True, without a clock, even chips from the same wafer could run at vastly different speeds. We see some of that effect from standard CPU builds - i.e. The old PMMX-166 -> 233MHz chips were all made at the same time, and marked according to what they could do (or how many of the slower ones they needed to sell). Basic qualification would be more extensive - no longer can you verify that a specific suite of tests completes with the chip at a certain clock speed, now you have to time the various ops and give each chip some rating based on that, and they wouldn't be very consistent. A much easier process when you only have a few choices to pick from, rather than a sliding scale.

    Plus, the interleaving of the syncronous and async domains of the chip could be very interesting. PLLs are great when you have a few regions (say two different speed PCI busses into one chip, along with a memory bus at a third speed)... clocks gan gain you a lot in terms of simulation abilities, too. Tough to sim a chip when the process *really* makes a big difference in the delays.

    --
  • well, to do a digital lowpass filter, you would use a FFT; i doubt that many commercial audio devices would use time-domain convolution when the FFT is faster

    For one thing, it wouldn't necessarily be faster. Filtering with FFT is O(n log w) where w == window size; time-domain convolution is O(nm) where m == filter length. The hard edges of the FFT window creates artifacts that can be audible as a buzzing noise; this is why MP3 and Vorbis spend a few extra cycles on MDCT (an overlapped transform). Besides, you don't need a lot of taps; I know of a decent FOUR tap low-pass filter [11 19 5 -3]/32.


    All your hallucinogen [pineight.com] are belong to us.
  • by hammy (22980) <hamish&hbarney,com> on Monday March 05, 2001 @04:41AM (#384530) Homepage
    Here's the URL for the asynchronous design group's homepage [sun.com] There's more info there.
  • by crgrace (220738) on Monday March 05, 2001 @08:01AM (#384532)
    I remember from my digital electronics class that in many ways, it makes the most sense to have tri-state (instead of binary) electronics

    You shouldn't use the term "Tri-State" is this context. Tri-State, a copyright of National Semiconductor, means drivers that can be put into a high-impedence output mode and so be disconnected from a bus in a simple way. What you are referring to is called "Multiple Valued Logic" and has been researched forever. It has found it's way into a few products (ROMs most notably) but in general is more work than it's worth.

  • I'm not trying to troll here but people have been trying to design asynchronous computers for decades now. For a while the British government sponsored some intensive research into asynchronous logic and what did they get out of it? Thath's right, nothing. The problem with asynchronous cirtuits is that you are still only as fast as the slowest gate in your ciruit. But the real issue here is obviously the race conditions that kill any non trivial asynchronous chip. Debugging such a race monster is a task beyond the capability of a human brain.

    With all the suffering and poverty in the world we should really question whether some "scientists" deserve the money they get or whether those same funds could be utilised elswhere.

  • It sounds interesting in theory, but in practice it will probably be wholly unsuccessful. It sounds like fuzzy/random logic which was supposed to revolutionize computers ("It's either 'yes' or it's 'no'. What if we had the computer return 'maybe'?"). This, and using crystals/bacteria for memory, and other things supposed to "revolutionize" the industry have fallen prey to two elements: 1.) infeasability (who wants to pay $5,000 for crystal memory) and 2.) the old method already works (hello, Intel).

    The effort sounds like a great science fair project. Above that, I can't see anything coming out of it to fruition.

  • I've seen this posted below, but there's an even more interesting question: Not all chips off the same line are the same speed. How do you "bin" the chips? Also, suppose one chip gets a somewhat faster overall chip, but for some reason the part that does FMUL is a little slow. Maybe my application cares, maybe not. If I need a DB server or web server or anything similar, who cares? But if I'm number crunching, I care very much. Without the MHz number, there is no effective way to compare two of the same chip. I think this isn't much of a problem for the embedded market (it will run X application in real-time, guaranteed); but for the CPU market it is a big deal.
  • There have already been commercially successful asynchronous computers. For instance, in the DEC PDP-10 family, the KA10 (1968) and KI10 (1972) processors were asynchronous, as was their predecessor, the 166 processor of the PDP-6 (1964). The PDP-10 family was commonly found in universities until the late 1980s.
  • Sorry, brain fade (I've been up for >24 hours). THe KI10 was not asynchronous. Just the KA10 and the 166.
  • by Stipe (35684) <cr212@iname.com> on Monday March 05, 2001 @04:46AM (#384556) Homepage
    The Amulet [man.ac.uk] project has been going for over 10 years (it's an asynchronous ARM-like core, IIRC). I remember seeing a circuit that did asynchronous addition (or was it multiplication?) in a lecture about 2 years ago.

    Another advantage to power is also the speed; the clock speed isn't determined by the worse case of the most expensive instruction. (e.g. adding 0 and 1 can be done a lot quicker than adding (2^31)-1 and 1, because of no overflow)
  • The following is a direct quote from the article you did not read. Asynch circuits are already being used:

    For example, Royal Philips Electronics has built a pager using asynchronous electronics, taking advantage of the fact that the circuits produce far less radio interference than do clock-driven circuits. This makes it possible to operate a radio receiver that is directly next to the electronic circuit, greatly increasing the unit's operating efficiency.

    Philips has also actively pursued research into asynchronous logic. Two small start-ups, Asynchronous Digital Design in Pasadena, Calif., and Theseus Logic in Orlando, Fla., are developing asynchronous chips for low-end consumer markets and high-performance computing systems.

  • So far most of the comments here are along the lines of "this won't work, it's too hard to debug, etc.". But it seems to me that the human brain is a pretty good example of asynchronous computing? The last time I checked, there wasn't any sort of high frequency clock signal running down my spine.
  • Someone mentioned CalTech - sorry, I forgot the cid, but thank you - so I went and did a little digging. Here [caltech.edu] is a link to the CalTech Asynchronous VLSI group. Right on the page are some cogent explanations of why they believe asynchronous designs will eventually become commonplace. Further in are pointers to some good papers, and an interesting discussion of their results implementing an asynchronous version of the MIPS R3000 architecture.

  • For a good introduction to asynchronous system design, look at:

    http://www.cs.man.ac.uk/async/background/return_ as ync.html

    A paper I found interesting on this subject:

    http://www.ee.ic.ac.uk/pcheung/publications/Asyn %2 0Bus%20ISCAS96.pdf

    Enjoy... I've heard Sutherland speak and he's done some very interesting things; most notably, he invented the method of "logical effort" for the sizing of transistors, without which CPUs would be several orders of magnitude harder to optimise than they are today.
  • by VHDLBigot (149431) on Monday March 05, 2001 @06:27AM (#384569) Homepage
    Asynchronous digital logic systems is an idea that has been tossed around for quite some time. I can't name all the difficulties with it but there are a few issues that haven't been brought up (at least when I scanned the postings).

    First, most ASICs built these days are built with logic synthesis tools from Synopsys [synopsys.com] or Cadence [cadence.com]. The inputs are typically register transfer level (RTL) code written in either the VHDL or Verilog languages. These logic synthesis tools have been around for quite some time (well over a decade for Synopsys) and have a significant infrastructure built around them. This design paradigm and sets of tools all assume synchronous logic. I can't fathom how you would build/constrain/debug these circuits in an asynchronous style with the existing toolset. And don't say "we'll use something else". It is these types of tools which have made our million gate ASICs possible. If we were still using schematics or other hack tools we would barely have passed the 80286. The current design tools took a long time to develop, hone, and get the bugs out of. The amount of money involved in just the tools is on the order of billions of dollars per year. That's a lot of inertia to move away from.

    Second, yes the asynchronous approach can reduce the power consumption of ASICs. However, there are a lot of clocked approaches that do a very good job of reducing power. It all depends on what goals you have when you design the ASIC. Having multiple clocks and clock gating is common in the low power and embedded domains. It hasn't been as much of a factor in desktop systems but is certainly in use in handheld devices. The Crusoe takes these approaches to an extreme level. It's all a matter of what you want to design for and time to market pressures.

    Lastly, speed. I think folks forget the feedback path. If you're going to rely on this asynchronous handshake, it requires a given stage to hold its outputs until the next stage acknowledges (asynchronously) that it got the data. This means the given stage can't accept anything new yet. This cascades/ripples back through the pipeline. This feedback takes time (and logic levels) that don't exist in clocked logic. Imagine an automotive assembly line where things could only move forward if each station got permission from his adjacent stations. In clocked logic you've guaranteed that the data is ready to move forward because you've calculated these things out. You've removed a bunch of communication overhead. Yes, there is slack in the synchronous pipeline, but for the most part current designs are pretty well balanced so that each stage uses a large portion of its clock cycle.

    That's about all I can think of at the moment. I need to be getting home before I get snowed in! ;-) Just a few comments from a digital hardware designer. Hope this provided some food for thought...

  • by wowbagger (69688) on Monday March 05, 2001 @05:35AM (#384570) Homepage Journal
    A floating point operation is usually taken to mean a floating point multiply followed by a floating point addition, also known as a Multiply/Accumulate Cycle (MAC).

    A MAC is a very important operation in digital signal processing. For example, to implement a digital lowpass filter (to remove tape hiss, for example), you define a finite impulse response filter (FIR filter) of some number of taps. You might need 256 taps to implement the needed low pass filter (this is a shot from the hip, the actual number of taps may be more or less). That means for every sample of audio (88.2kSamples/second for stereo audio) you need to do 256 MACs, or 22.6MFLOPS.
  • This is indeed intriguing; what unit will replace the familiar megahertz? Well, the obvious answer would be BogoMIPS, but that doesn't have anything to do with clockcycles, so you might want an even more meaningless number. How about BogoMHz?

    Happy now?

  • actually it sounds like a verification NIGHTMARE.
    ASICs are hard enough to validate today with the
    few async pieces we have to put into them.

    Asych logic may look nice but unless we get some
    major breakthroughs in verification tools, don't
    look for it anywhere near the future.
  • WRT the feedback path overhead....

    IANAD (I am not a designer), but thinking about this I wonder if buffering interstage registers might not mitigate feedback path delay. Imagine three registers, R1 output of Stage 1, R2 buffer, and R3 input to Stage 2. Each register has a control bit (0 = read, 1 = unread). Further imagine two simple register to register copy circuits, one to copy R1 to R2, and a second to copy R2 to R3.

    I apologize for the primitive exposition (I said IANAD), but intuitively it seems to me that such a buffer scheme could let logic stages overlap processing. The cost would be the time needed for the two hair-trigger copy operations between logic stages, but that should be minimal.

    Bang1 - CopyR1R2 frees Stage1 to execute again, Bang2 - CopyR2R3 tells Stage2 it has an input.

    If Stage1 completes a fast operation, the buffer copying lets it take on the next one (which might not be fast) perhaps before Stage2 is ready for its next input. Thus Stage1 and Stage2 can overlap in some circumstances, increasing overall speed. Multiply this by a dozen or so pipeline stages and the savings might be worth the effort.

    Or perhaps this is the overhead the parent post was referring to...

  • If you wanna know how fast a mhz-less chip is, just count the fps in Quake. DU-UH!

    But seriously, even supposing we could come out with a retail clockless cpu, wouldn't it require a plethora of equally clockless peripherals like video cards and ide controllers and whatnot ? Otherwise it would need a clock to drive these "external" devices (external from the cpu's view, that is), and then we fall back into the same pit. The concept is fascinating but ill-fated I'm afraid.
  • by helge (67258) on Monday March 05, 2001 @04:52AM (#384593) Journal
    Asynchronous CPUs have existed for a while. They don't seem to have become very popular, though. Apparently, they don't give the power/speed advantage that you would expect at first glance. A quick search with Google gave this:

    Asynchronous ARM core nears commercial debut (1998) [edtn.com]
    ARM researches asynchronous CPU design (feb 1995) [computer-design.com]
    AMULET3: A High-Performance Self-Timed ARM Microprocessor (1998) [ibm.com]

Debug is human, de-fix divine.

Working...