Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Hardware

Asynchronous Logic: Ready For It? 192

prostoalex writes "For a while academia and R&D labs explored the possibilities of asynchronous logic. Now Bernard Cole from Embedded.com tells us that asynchronous logic might receive more acceptance than expected in modern designs. The main advantages, as article states, are 'reduced power consumption, reduced current peaks, and reduced electromagnetic emission', to quote a prominent researcher from Philips Semiconductors. Earlier Bernard Cole wrote a column on self-timed asynchronous logic."
This discussion has been archived. No new comments can be posted.

Asynchronous Logic: Ready For It?

Comments Filter:
  • by Anonymous Coward on Monday October 21, 2002 @11:26AM (#4495711)
    I think you are correct that modern microprocessors use asynchronous logic in certain parts and synscronous design in others.
  • Doing it already... (Score:2, Informative)

    by Sheetrock ( 152993 ) on Monday October 21, 2002 @11:29AM (#4495752) Homepage Journal
    Technically speaking, if you're not using a SMP system you're processing logic asynchronously.

    But more to the point: while asynchronous logic may appear to offer a simple tradeoff (slower processing time for more efficient battery life), recent advances in microsilic design make the argument for asynchronous components moot. For one thing, while two synchronous ICs take twice the power of one asynchronous IC (not quite because of the impedance caused by the circuit pathway between two chips, but that's negligible under most circumstances), they will in general arrive at a result twice as quickly as its serial pal. Twice as quick, relatively equal power consumption.

    The real reason for the drive towards asynchronicity is to cut down on the costs of an embedded design. Most people don't need their toaster to process the 'Is the bread hot enough' instruction with twice the speed of other people's toasters. But for PDAs (Personal Data Assistants) or computer peripherals I wouldn't accept an asychronous design unless it was half as much.

  • by Anonymous Coward on Monday October 21, 2002 @11:34AM (#4495796)
    "PhysicsScholar", don't karma-whore and post plagiarizing. You copy-and-pasted from http://www.cis.unisa.edu.au/~cisdak/nResearch/Asyn c.html [unisa.edu.au]. This gets you moderated "Redundant."

    It'll be enlightening for people to just go there and read your information in context anyway, plus there are links to papers and stuff. You shoulda posted the link!!

  • Within the space of a single clock cycle, the Pentium (or other designs) might make use of asynchronous logic, but (and this is the important bit) the asynchronicity only exists within the domain of the CPU. The external interface to the CPU is still governed by a clock: you supply the CPU with inputs, trigger its clock, and a short (fixed) while later it supplies you with outputs. Asynchronous logic removes the clock entirely.
  • by Milican ( 58140 ) on Monday October 21, 2002 @11:46AM (#4495908) Journal
    Because posting without giving credit to original author is wrong.

    JOhn
  • More info: (Score:5, Informative)

    by slamden ( 104718 ) on Monday October 21, 2002 @11:48AM (#4495930)
    There was an article [sciam.com] in Scientific American about this just recently...
  • Read the article (Score:5, Informative)

    by Animats ( 122034 ) on Monday October 21, 2002 @11:56AM (#4495994) Homepage
    Read the cited article: "Asynchronous Logic Use -- Provisional, Cautious, and Limited". The applications being considered aren't high-end CPUs. Most of the stuff being discussed involves low-duty-cycle external asynchronous signals. Think networking devices and digital radios, not CPUs.

    In synchronous circuits, there are power spikes as most of the gates transition at the clock edge. It's interesting that this issue is becoming a major one. ICs are starting to draw a zillion amps at a few millivolts and dissipate it in a small space while using a clock rate so high that speed of light lag across the chip is an issue. Off-chip filter capacitors are too far from the action, and on-chip filter capacitors take up too much real estate. Just delivering clean DC to all the gates is getting difficult. But async circuitry is not a panacea here. Just because on average, the load is constant doesn't help if there are occasional spikes that cause errors.

    One of the designers interviewed writes: "I suspect that if the final solution is asynchronous, it will be driven by a well-defined design methodology and by CA tools that enforce the methodology." That's exactly right. Modern digital design tools prevent the accidental creation of race conditions. For synchronous logic, that's not hard. For async logic, the toolset similarly has to enforce rules that eliminate the possibility of race conditions. This requires some formal way of dealing with these issues.

    If only programmers thought that way.

  • by Orne ( 144925 ) on Monday October 21, 2002 @11:57AM (#4496009) Homepage
    The root problem is data transfer within the CPU, not data transfer between I/O devices.

    The clock speed (now >10E9 Hz) is the upper limit of your chip's ability to move a voltage signal around the chip. Modern CPUs are "staged" designs, where data is basically broken into an opcode "decode" stage, "register load", "operation", and "register unload" stages. For a given stage, you cannot clock the output of the stage faster than the time it takes for the computations to complete, or you're basically outputting garble.

    A synchronous design indicates that every flip-flop on the chip is tied to the same clock signal, which can mean one HUGE amount of wiring just to get everything running at the same speed, which raises costs. On top of that, you have charging effects due to the switching between HI and LO, which can cause voltage problems (which is why capacitors are added to CPUs) Then add resistive effects, where current becomes heat, and you run the risk of circuit damage. All of this puts some hard limits on how fast you can make a chip, and for what price.

    Asynchronous chip design allows us to throw away the clock circuitry, and every stage boundary becomes status polling (are you done yet, are you done yet, ok, lets transfer the results). With proper design, you can save a lot of material, you can decouple the dependance of one stage on another, so the max instruction/second speed can now run at the raw rate of the material.
  • by anonymous loser ( 58627 ) on Monday October 21, 2002 @12:09PM (#4496121)
    The advantage outlined here seems to be independant functionality between different areas of the PC. It would be nice if the components could work independently and time themselves, but is there really a huge loss in sustained synchonous data transfer?


    Yes, for many reasons which are somewhat glossed over in the article (I guess the author assumes you are an EE or CPE familiary with the subject). Here's a quick breakdown of the two major issues:


    1. Power Distribution & Consumption - In a synchronous system, every single unit has a clock associated with it that runs at some multiple of the global clock frequency. In order to accomplish this you must have millions of little wires running everywhere which connect the global clock to the individual clocks on all the gates (a gate is a single unit of a logic function, sorta like a 0 or 1). Electricity does not run through wires for free except in superconductors. Real wires are like little resistors in that to push the current through them, you have to give up some of the power you are distributing (how much is a function of the cross-sectional area of the wire). The power which doesn't make it through the wire turns into heat. One of the reasons you can fry an egg on your P4 is because it's literally throwing away tons of power just trying to syncrhonize all the gates to the global clock. As stated in the article, in an asynchronous system, the clocks are divided up on a modular basis, and only the modules that are running need power at all. This design technique is already used to some degree in synchronous designs as well (sorta like the power saving feature on your laptop), but does not benefit as much since in a synchronous design must always trigger at the global clock frequency rather than only triggering when necessary.


    2. Processor Speed - Much like the speed of an assembly line is limited to the slowest person on the line, so too is the speed of a CPU limited to the slowest unit. The problem with a synchronous design is that *everything* must run at the slower pace, even if they could theoretically move faster. In an asynchronous design, the parts that can go faster, will, so the total processing time can be reduced.


    Hope that helps.

  • by Rolo Tomasi ( 538414 ) on Monday October 21, 2002 @12:12PM (#4496154) Homepage Journal
    AFAIK the modern CPUs are already asynchronous internally to a large extent. This is because at today's clock frequencies, the signal runtime difference becomes significant, i.e. by the time it takes for the signal to move across the whole die, several clock cycles would already have passed. So, prefetch, ALU, instruction decoding, FPU, etc. all operate independently from each other. I'm no expert on this though, maybe someone more knowledgable than me can shed more light on this.
  • by darn ( 238580 ) on Monday October 21, 2002 @12:20PM (#4496247)
    The largest ascynchronous project (to my knowledge)is the MiniMips [caltech.edu] that was developed at Caltech 1997 and has 1.5 M transistors. It was modelled after the R3000 mips architecture.
    The best selling larg scale asynchronous circuit seems to be a micro controler that Philips [philips.com] developed and used in a pager series.
  • by hamsterboy ( 218246 ) on Monday October 21, 2002 @12:29PM (#4496331)
    Actually, the biggest advantage is in routing.

    On a synchronous design of any complexity, quite a bit of the routing (i.e. where the wires go) is due to clock distribution. The CLK signal is one of the few that needs to go to every corner of the chip. There are various strategies for doing this, but they all have difficulties.

    One method is to lay a big wire across the center of the chip. Think of a bedroom, with the bed's headboard against one wall; you end up with a U-shaped space. Now, suppose you (some data) need to get from one tip of the 'U' (the decoder) to the other (an IO port). Either you have to walk around the entire bed (a long wire), or go over it (a shorter wire). The obvious choice is to go over, but when you have a wire with one voltage crossing a wire with a (potentially different) voltage, you get capacitance, and that limits the clock speed of the entire chip.

    With an asynchronous design (lots of smaller blocks with their own effective clocks), you don't have this. Data can be routed wherever it needs to go, without fear of creating extra capacitance. The downside is that they're very difficult to design. This is partially because there are no tools for this - most of the mainstream hardware simulators slow waaaaaaayyy down once you get more than a few clock signals running around.

    -- Hamster

  • by taeric ( 204033 ) on Monday October 21, 2002 @12:36PM (#4496390)
    Not sure if you were serious or not...

    Software will having next to nothing to do with the race conditions in the processor. Instead, the race condition you pointed out will be the difficulty. That is, how can you ensure the "ready" signal is indeed slower then the computations that a module is performing? This is not an easy thing to do. Especially if you want it to report as soon as it is done. Most likely, a signal will fire after the longest time a unit could take. You do not have a speed up for the fast solutions, but you don't have to worry about complex logic on the ready path, either. Another solution would be handshaking, but then you may run into an explosion in the amount of logic.

    Also, something I think would be a problem. Many of the current optimizations in out of order execution are almost custom fit to a clocked design. That is, the processor knows IO will take so many cycles, branches take a certain amount, etc. Now, currently (especially with hyperthreading) the processor is coming closer to keeping the execution units busy at all times. Do people really expect some magical increase when the clock is taken out? The scheduler will have to change dramatically. Right?
  • by brejc8 ( 223089 ) on Monday October 21, 2002 @12:36PM (#4496392) Homepage Journal
    There are two factors here.
    Firstly the on a glitch the synchronous part will take a certain period to return a wire low/high and resume its operation. By then it would be too late as the clock has gone. A asynchronous property called Delay Insensitivety which some designs have allows any wire to have any delay to rise or fall. So you can pick of any wire from your lets say ALU reeroute it outside the chip through a a telephone line to the other side of the world and back to the inside of the chip and the design would still work (maybe 1 ips but never the less the result would be correct)
    Secondly async releases much less EMI. The inside of your computer is riddled with radiations much nastyer than cosmic rays. Most chips are composed of millions of arials which pickup all these rays and make your chip malfunction. Fine you can slow down your clock and hope for the best but its better not to create them in the first place.
  • by default luser ( 529332 ) on Monday October 21, 2002 @12:47PM (#4496486) Journal
    Actually, most popular communications formats are "asynchronous".

    Don't confuse yourself. Synchronous communications involve a real-time shared clock between points.

    Then you have asynchronous communications standards like RS-232. The sender and receiver choose a baud rate, and the receiver waits for a start bit, then starts sampling the stream using it's local clock. So long as the clocks are close enough, and the packets are short enough, you'll never get an error.

    Then you have standards like Fast Ethernet, which are also asynchronous. AFAIK, the clock used to decode the Ethernet packet is contained somewhere in the preamble, and a PLL is tuned to the packet's clock rate. This is to avoid the obvious problems of the simple async communications of RS-232.

    A SAMPLE OF THE ACTUAL CLOCK used to encode the packet is avaliable to the receiver, but the receiver can only use this to tune it's local clock. It has to do the decoding asynch.
  • Pipelining (Score:5, Informative)

    by Andy Dodd ( 701 ) <atd7NO@SPAMcornell.edu> on Monday October 21, 2002 @01:07PM (#4496744) Homepage
    In most modern CPUs, all of those occur independently in different units in the pipeline.

    But they still do their function once per global clock cycle. After that, they pass their results on to the next stage.

    As a result, the clock rate is limited by the longest propagation time across a given pipeline stage. A solution that allows for higher clock speeds is to increase the number of pipeline stages. This means that each stage has to do less. (The P4 one-ups this by having stages that are the equivalent of a NOP just to propagate the signal across the chip. But they're still globally clocked and synchronous.)

    P4 has (I believe) a 20-stage pipeline. (It's in that ballpark) - The Athlon is sub-10, as are almost all other CPUs. This is why the P4 can achieve such a high clockrate, but it's average performance often suffers. (Once you have a 20-stage pipeline, you have to make guesses when branching as to WHICH branch you're going to go on. Mispredict and you have to start over again, paying a clock cycle penalty.)

    Shorter pipelines can get around the branch misprediction issue by simply dictating that certain instruction orders are invalid. (For example, the MIPS architecture states that the instruction in memory after a branch instruction will always be executed, removing the main pipeline dependency issue in MIPS CPUs.)

    With asynch logic, each stage can operate independently. I see a MAJOR boon in ALU performance - Adds/subtracts/etc. take up FAR less propagation time than multiplies/divides - but in synch logic the ALU has to operate at the speed of the slowest instruction.

    Most important is the issue of power consumption - CMOS logic consumes almost no power when static (i.e. not changing its state), power consumption is almost exactly a linear function of how often the state changes, i.e. how fast the clock is going. With async logic, if there's no need for a state change (i.e. a portion of the CPU is running idle), almost no power consumed. It is possible to get some advantages in power consumption simply by changing the clock speed. (e.g. Intel SpeedStep allows you to change between two clock multiplier values dynamically, Transmeta's LongRun gives you FAR more control points and saves even more power, many Motorola microcontrollers such as the DragonBall series can adjust their clock speed in small steps - One Moto uC can adjust from 32 kHz to 16 MHz with a software command.)
  • Read the article... (Score:3, Informative)

    by Andy Dodd ( 701 ) <atd7NO@SPAMcornell.edu> on Monday October 21, 2002 @01:12PM (#4496826) Homepage
    You'll notice that Cadence (one of the big EDA software companies) is cooperating with/heavily investing in one of the async hardware companies. It's not an untapped opportunity - They're tapping it as we post. :)
  • Just one thing... (Score:3, Informative)

    by Andy Dodd ( 701 ) <atd7NO@SPAMcornell.edu> on Monday October 21, 2002 @01:18PM (#4496898) Homepage
    In CMOS logic, power consumption is not related too much to the static state of the chips, i.e. "transistor is on for 5 ps".

    It's related to how often the state change occurs.

    A good example of where async logic might be useful:
    ALU multiply operation takes 20 pS, LOTS of transistors
    ALU add/subtract op takes 5, FAR fewer transistors
    In current designs, this usually means that add/subtract ops have to run at a clock rate that is slow enough to accomodate that 20 pS clock
    In an async design, the add/subtract instructions can run 4 times as fast. But since the multiply/divide stage is not clocked, those transistors aren't doing anything so overall power usage is less. (The add/subtract stage uses 4x the power it did before, but the mult/div stage was probably using 10x or more the power the add/sub stage was using)
  • Re:UARTs? (Score:2, Informative)

    by default luser ( 529332 ) on Monday October 21, 2002 @01:25PM (#4496965) Journal
    As I explained above, RS-232 ( which uses a UART ), is an asynchronous communications standard. The chip itself is clocked.

    There is no clock shared between the two points of communication. Each end "agrees" on a clock speed, but there is no guarantee how accurately each end produces said clock speed.

    A receiver detects a new packet when it receives the start bit, and it samples the incoming serial stream using it's own local clock. This is the asynchronous part of the communications, the receiver really has no idea if it's sampling right, and clock skew between the sender and receiver can produce errors.
  • Re:What if? (Score:3, Informative)

    by twfry ( 266215 ) on Monday October 21, 2002 @01:26PM (#4496982)
    Yes, temperature does effect switching time, although not nearly as much as voltage for sub-micron channels. Lower temperature translates into slightly faster switching times. But if you really wanted to speed up a path, a slightly higher voltage will perform the job better. Also, in the field its easier to have control over voltage than temperature.

    As a result of this, one of the newer hardware design ideas it to provide multiple voltage levels within a chip. Higher performace logic is driven by a larger voltage difference than logic where performance is not as much of a concern.

  • by Anne Thwacks ( 531696 ) on Monday October 21, 2002 @01:37PM (#4497107)
    The main problem with asynchronous logic is that it is impossible to PROVE, even by testing, that it will meet any worst case spec.

    Seymour Cray tried it after the 7600 and before making the Cray-1. He decided that regardless of the performance advantage, people wanted a computer that was KNOWN to work.

    In this, like in most other things he did Cray was right

  • The Vax 8600 CPU (Score:3, Informative)

    by Tjp($)pjT ( 266360 ) on Monday October 21, 2002 @02:57PM (#4497958)
    Is the earliest example I can think of. Roughly 50 ECL 10000 Gate Arrays. It was only syncronous at the "edges" like buss interfaces. Circa 1983. I was on the simulation tool design team. Loads of fun on the skew analysis portion of the simulation. You have to account for all the "local" varieties of skew (within a cell, within a quadrant of the chip, and within the chip overall, and more), and the lead and trace generated skew as well.
  • by Neurotensor ( 569035 ) on Monday October 21, 2002 @03:34PM (#4498293)

    I'm studying chip design and my supervisor scoffs at asynchronous logic. I don't have any real input of my own, but his view is that we've been waiting for commercially viable asynchronous designs for as long as cheap fusion, and neither has happened yet despite many loud enthusiasts.

    One of the real problems of asynchronous logic is in testing. With synchronous logic your design is partitioned into registers and combinational logic. The combinational stuff can be tested at production by use of every possible test vector, while registers are rather easy to test. Together these two tests virtually guarantee that the state machine works. Do that for every state machine and you're done.

    Asynchronous state machines, however, have no obvious way to break them down. You have to give them sequences of inputs and check their sequential outputs. Even if you think it's working you can never be sure, and what happens when the temperature changes? Race conditions can result in the state machine breaking under changing temperatures.

    Synchronous design is a very mature field. Nowadays you can be sure that a design works before fabrication (well, almost.. =) and then synthesise it into gates that ought to work first go. If they didn't then AMD and Intel would go under pretty soon!

    Asynchronous design is hard and my hat goes off to the people who do it for a living. But the same amount of effort would result in far more development using standard techniques. I guess you really have to want to do it.

    Yes, synchronous logic has serious issues with clock distribution, but it's still the most commercially viable design technique. The fact that your CPU is fully synchronous is testament to that.

    So, which will come first: cheap fusion or reliable asynchronous logic?

  • AMULET (Score:1, Informative)

    by Anonymous Coward on Monday October 21, 2002 @04:52PM (#4498916)
    The AMULET [man.ac.uk] group at the University of Manchester has been doing research and implementation for asynchronous processing (based around an asynchronous ARM design) for many years. They have a bunch of good information available on their projects, and the subject in general.
  • Re:What if? (Score:2, Informative)

    by Si_Cowboy_03 ( 616992 ) <Si_Cowboy_03@yahoo.com> on Monday October 21, 2002 @05:19PM (#4499179)
    Asynchronous circuits inherently run at the fastest possible speed, given the conditions (i.e. temperature, operating voltage), because they are self-timed, not timed by an external source. Transistors change state faster in lower temperatures and higher voltages. Since the transistors trigger each other to change states, that automatically happens as fast as possible. In synchrounous logic, the clock time is the assumed ammount of time of the worst case of the slowest transistor combination.
  • by Si_Cowboy_03 ( 616992 ) <Si_Cowboy_03@yahoo.com> on Monday October 21, 2002 @06:11PM (#4499554)
    I just happened to check out a textbook on the subject of asynchronous circuit design and so far its been pretty good (1st part of chapter 1) Anyway it gives the benefits of asynchronous design:
    1. Elemination of clock skew problems - the clock is a timing signal, but it takes a certain amount of time for the clock signal to propogate around the chip, so as the clock frequency goes up, this becomes a huge problem
    2. Average-case Performance Synchronous circuits must be timed to the worst performing elements. Asynchronous circuits have dynamic speeds.
    3. Adaptivity to processing and environmental variations Dynamic speed here againg. If temp goes down, circuit speeds up. If supply voltage goes up, speed goes up. Adapts to fastest possible speed for given conditions
    4. Component modularity and reuse easier interface because difficulty with timing issues are avoided (handshake signals used instead).
    5. Lower system power requirements it takes alot of power to propogate the clock signal, plus spurios transistor transistions are avoided. (MOSFETS only use considerable power when they change states).
    6. Reduced noise All activity is locked into a single frequency in synchronous, so big current spikes cause large ammounts of noise. Good analogy is the noise of 50 marching soldiers vs. the noise of 50 people walking at their own pace. The synchronous nature of the soldiers causes the magnitude of the noise to be much greater.
    Major drawback: Not enough designers with experience and lack of asynchronous design tools. So far the book is a great read, but pretty technical (good for an EE or com sci person who's had a basic digital logic class).

    The book is "Asynchronous Circuit Design" by Chris J Myers from the University of Utah.

    Also I wrote a paper about this for my computer architecture class:
    http://ee.okstate.edu/madison/asynch.pdf [okstate.edu]

Those who can, do; those who can't, write. Those who can't write work for the Bell Labs Record.

Working...