Clockless Computing? 225
ContinuousPark writes: "Ivan Sutherland, father of computing graphics, has been for the last ten years designing chips that don't use a clock. He's been proposing a method called asyncronous logic where there's no clock signal being distributed and regulating every part of the chip. The article doesn't give many technical details (greatly needed) but Sutherland, now doing research for Sun, is telling that significant breakthroughs have been made recently to make this technology viable for mass production. It is estimated that 15% of a chip's circuitry is dedicated to distributing the clock signal and as much as 20% percent of the power is consumed by the clock. This is indeed intriguing; what unit will replace the familiar megahertz?"
Nope, not FLOPS... (Score:1)
Clockless Computing Works (Score:1)
Re:Several Issues - Tools ARE very important! (Score:2)
Logic synthesis tools are very important to modern day IC design and optimization.
That is why Theseus Logic, Inc. [theseus.com] (mentioned about 2/3rds down in the NYTimes article) has a Strategic Alliance with Synopsys [theseus.com]. Our patented NCL (Null Convention Logic) [theseus.com] technology, unlike many other asynchronous technologies, is designed for maximum interoperability with existing tools, maximum design reuse and near-complete elimination of common CBL (clocked boolean logic) timing closure issues.
For those that mentioned Amulet, its project leader, as well as original ARM designer, Steve Furber is on Theseus' Advisory Board [theseus.com].
Please visit our web site [theseus.com] for more information.
Disclaimer: I am an employee of Theseus Logic, Inc., who is NOT speaking on behalf of Theseus Logic in this post, nor his its content been approved by any Theseus Logic official.
-- Bryan "TheBS" Smith
No clock cycle?? Hmmm.... (Score:1)
Power and cooling (Score:1)
Imagine your local server room. Don't you think they would like a 20% decrese in their power bill?
We can afford the power bill easily enough. What we can't deal with is the extra cooling.
Does anyone know roughly:
Re:What will happen to overclocking? (Score:1)
asynch chips go as fast as the hardware can when the software needs it
Re:Different Kind of Clock (Score:1)
Measuring Speed (Score:1)
Intel Pentium V Fast
Intel Pentium V Really Fast
Intel Pentium V Yeah we know this one costs the same as the fast one last year but it is so much faster.
AMD Thunderbird Oh my god did you see how fast that was
AMD Thunderbird Seriously ya'll this is quick
IRNI
The whole thing may not be as complcated (Score:1)
Although it's much harder to design such a machine, it is not impossible, and it acts the same way as a clocked machine, except there is no clock to interface you with it, so you would have to have it output a simulated "clock" that would give you the information on when it's ready for the next instruction.
This clock wouldn't really be a clock, as you might expect. It would give you an edge when the AFSM is ready, so the period would vary from instruction to instruction.. This isn't such a problem though, since it's the clock that dictates the pace anyway..
If any of you are interested in more on this, check out our text, "Fundamentals of Digital Logic with VHDL Design" by S. Brown. Chapter 9 introduces the concepts.
Janimal
Re:Units (Score:1)
About programmability... (Score:2)
The second thing: I'd hope he'd have made some advances in programmability in the mean time...
Re:This sounds like a dataflow machine (Score:4)
So, what is a CFPP? It is a processor with a pipeline where data and instructions flow in opposite directions, with the instructions usually thought of as moving "up" and data as moving "down". The functional units (FU) are attached as sidings to the main pipeline. Each FU launches from a single pipeline stage and writes its results to a different stage, further "up" the pipeline. The main goals of this architecture were to make the processor simple and regular enough to create a correctness proof and to achieve purely local control.
If Sun ever produces a processor that is asynchronous, it will likely look similar to this.
--
"You can put a man through school,
But you cannot make him think."
Re:Units (not floating point operations) (Score:2)
You mean, like the way they optimize for MHz over other, useful things, like flops? Remember when AMD did that little ad campaign of "Our 800 MHz chip is faster than Intel's 766 MHz chip!" How many "normal" people followed that one? Today, MHz is the standard rating of speed, and is misleading. mflops would be a much better measure (although you're right that, with different ops taking different amounts of time, you'd have to carefully define what you mean by an operation).
Secondly, I don't think it will take "several years" of experimentation to figure out how much faster your add is than your multiply. We already know the answer to that question, and it depends on how you decide to impliment your circuit. If you decide to do multiplication with shift/add you could get a tiny little multiplier that's freaking slow, or you can go hog-wild with 7,3 counters, wallace trees, fast adders, etc. etc., and have a gigantic circuit that's really fast, but that's how hardware design has always worked and the options for solutions will be unchanged. Now though, you have a few more choices to make since your ops don't all have to fit into equal length pipeline stages, and also each op doesn't have to take the same amount of time for each set of inputs (for example, 7 + 1 might take x gate-dealays of time, whereas 7 + 0 could take many less.)
It's all very exciting.
God does not play dice with the universe. Albert Einstein
What constitutes computing power? (Score:1)
Making the familiar measurement meaningless might put more emphasis on benchmarks, and give more impetus for getting them better standardized and more meaningful.
Philips Async DCC Chip (Score:1)
This sounds like concurrent functional programming (Score:2)
This sounds like the kinds of problems that the concurrent functional programming people love. See Erlang [erlang.org] or perhaps some concurrent variation of Haskell [haskell.org] like Eden [uni-marburg.de].
Regards,
Zooko
Re:How about the human brain? (Score:1)
- Steeltoe
Re:Units (Score:1)
Yes, but the point is that even on the same processor may take a different amount of time to do the same operation albeit on different data.
Re:How about the human brain? (Score:1)
Asyncronous computers will have a timing clock, just not a clock signal that controls the gates on each functional component. This is the difference between attaching a few thousand parts to the clock and attaching 37 million+.
Design Logic (Score:4)
What makes it interesting is that you have to fundamentally redesign your your whole logical design so that you have a general purpose design.
With clocked computing, it is easy to see how you would flush buffers, etc. Clockless computing would be more problematic, and of course, would probably be proprietary.
My initial reactions are that it would work easiset in things like embedded processing. I also wonder if there would have to be some sort of evolution similar to what we have seen over the past few years with Intel, Motorola, etc.
One must not forget that the increases in performance for an awfull lot of these chips has to do with clock speed increases, as well as code designed to take advantadge of certain coding features in the hardware.
an early example of this is when the Pentiums first came out. For a while you had 486 boxes and pentiums with the same clock speeds on the market. you could compare performance between systems with the same video cards, same ram, same cache, etc. even though the chip sets with not the same, etc. This was educational. As I recall, the performance boost for somewhere not taking advantadge of the pentium feature set was aboput 20 - 25% (?) I may have this wrong, of course.
But at a time when pentium systems cost twice of a 486, it was definitely buying for the future.
Re:How about the human brain? (brain clock) (Score:1)
Moderate this back up! (Score:2)
I'd have thought that a post about a commercially available async. processor and the benefits of async. design are rather "on topic" for a story on an async CPU... particularly when the story claims this is something new when the idea itself is decades old, and Amulet itself (async. ARM, designed by Steve Furber, the original ARM architect, now a professor at Manchester University) has been around for quite a while.
As this post points out, not only is async. (i.e. data driven) design good for low power (you only use power when doing something!), but it also promises to raise performance by allowing each part of the chip to independently run as fast as it is able, and compute results as soon as it's operands are available rather than waiting for the next clock.
Re:So how do you measure speed? (Score:2)
I somewhat diagree about where speed grades would be an issue. The obvious market for async. CPUs like Amulet is in handheld consumer devices where precise performance characteristics don't matter (hence the acceptability of conventional power management techniques), but power consuption does.
For embedded real-time applications, however, you need repeatability more than power savings or peak performance.
Disregard Krappenhaver, he knows nothing (Score:2)
And yes, it's true, Michael stole the Signal 11 account from the person who was using it. Shortly after it happened, there was a discussion about it on one of the front page stories of the day, in which Michael participated and basically admitted what he did (he acted like there was nothing wrong with it). He then promptly modded down all the posts in that thread (including his own) to -1 so they wouldn't get archived. Anybody who was paying attention at the time will remember what I'm talking about.
Axel backed up his statements with references wherever possible, and you merely assert, with no evidence, that he's lying.
Folks, what Axel said is true. Ignore "Ryan"; he's just trying to confuse the issue.
Clockless Computing / Buses (Score:1)
Null Convention Logic (Score:1)
Re: (Score:1)
Re:How about the human brain? (brain clock) (Score:2)
Yeah, but that clock is like the BIOS or hardware clock, not like the CPU clock. It times large scale activity in the brain, but not second by second activity. I remember reading about clocks that governed second by second activity that could be read in brainwaves.
Re:Would this really work? (Score:1)
Waste of time.. (Score:2)
It's becomming increasingly harder to shrink chip sizes and increase speeds. Even with using different metals such as copper and shrinking trace widths, we are eventually going to hit a brick wall with current technology. After doing so, taking away 15% of the chip complexity is not going to go far in creating the next generation faster chip.
It's time to look to new technologies: carbon nanotubules and buckyballs, quantum computing, etc.
Re:Would this really work? (Score:1)
Re:Asynchronous Logic (Score:1)
Re:Would this really work? (Score:1)
You're right and yes, I have. To 288 packages, each one an ASIC, over a volume of about one cubic foot. I more or less gave up doing hardware after that!
Why not? (Score:1)
Re:Mips, Flops, and lack of Clocks... (Score:1)
>TIMEDEMO 1
Well, that's all most people seem to care about these days anyway.
"Hey man, I get 143 frames per second in Quake 3! 147 if I overclock!"
"Everything you know is wrong. (And stupid.)"
We can fix that (Score:2)
I can introduce you to one of several cousins who generally have the effect of sending high-freqency signals not only along your spine, but along nerves you never knew you had before - if the ``mike'' in your email address does stand for michael. Youngest candidate is about 15, oldest is about 30. Warning: they're more likely to stop your clock than start it, if the old ticker isn't in good shape or the old blood supply is a bit lean... (-:
Re:Units (Score:1)
Hm. How about the time it takes to compile the generic NetBSD kernel? Since NetBSD runs on almost anything for which the speed is interesting, and compiling involves a nice mix of real-world operations, execpt for floating-point.
(Yeah, I'm joking... a little bit :)
The "Simpler Design" (Score:1)
your sig (Score:1)
Another approach to the same idea... (Score:2)
For example, in traditional processor design the first stage of execution is decoding, and the second is register lookup; in Chuck's chip there is no register lookup, because he uses a stack. Because of this, instruction decoding proceeds in parallel with all the ALU computations and result accesses, and when the instruction is decoded the result is simply to gate a result into the TopOfStack register (and sometimes to pop the stack, if the instruction was supposed to consume two values).
The only exception in the current design is the ADD instruction, which he's implemented as a ripple-carry; that can sometimes take more than one cycle to compute, so if there's a possibility of a large carry the programmer is responsible to insert up to 3 NOPs.
The URL is http://www.ultratechnology.com.
In my microprocessor design class a few years back, I built my processor around these concepts. It was an unqualified success; it was easy to build, easy to program, much less resource-constrained than any of the other designs in the class, and ran all the required programs much faster than anything else. Almost everyone else was following the party line and building a RISC-style two-operand machine; since we only had eight bits per instruction, this was suicidal. The few who weren't completely toeing the line were building accumulator machines, which worked well but didn't have the sheer flexibility.
-Billy
Re:What's the point? (Score:2)
The biggest point is that this will save enormous amounts of electrical power that can be used elsewhere.
Sutherland's work is nothing new. The Computer Science department at the (a department he founded) has been working on this for years as well. They have made significant progress. [utah.edu]
I know, I worked with one of the professors for a while before I went into electrical engineering.
Just 'cuase the British government can't make it work does not mean it is impossible. Many inventions we use every day were considered "impossible".
Remember that since you don't know about the research, perhaps there is something you don't know.
Re:Mips, Flops, and lack of Clocks... (Score:2)
Seems to me that system performance should be be measured by real-world benchmarks that most people can relate to and that hold much more importance to the user, such as:
SBT - System Boot-up time, where the value is normalized modulo the time it takes to get coffee
GAFR - Graphics Accelerator Frame rate, perhaps using Q3 as the de-facto standard
and the related:
MTBF - mean time between frags
of course, we should also measure things like:
MTBR - Mean Time Between Reboots
MTBNR - Mean Time Between Network Redials
... but these are questionable since they can affected by outside influences unrelated to system performance
Re:Would this really work? (Score:2)
That's the whole point of many years of resarch on the subject. This is nothing new.
The CPU hits bottlenecks in slow components now. The difference is that right now everything keeps running at full power during a bottleneck while it does nothing at all. With async design, when components are waiting they take a fraction of their power as there is not clock cycling them off-and-on.
Also, the layout and design of a synchronous clock is a major limiting factor in CPU design.
Units (Score:4)
Even without a processor clock, you should still be able to measure how many operations it can do per (real-time) second.
Armchair computer science abounds on /. (Score:5)
It is both amusing and frustrating to hear all of the "armchair computer scientists" discussing the reasons this technology is a bad idea. As if they knew more about the subject than the many PhD's who have dedicated their careers to this subject based on the knowledge gleamed from the one Computer Architecture class the poster took as an undergraduate.
I was invited to work on a team at the University of Utah (Sutherland's old school) where they were researching this very topic. This is old news; they have been working on it for years. And as some people have correctly pointed out, there are both good and bad points to sync or async logic.
There are two major reasons to work on async logic: clock skew and power savings. The reason for power savings alone is a good one. People here have been complaining that it "is not worth it for only a 20% power savings".
Yes it is! In a modern office, computers end up taking a lot of power. Imagine your local server room. Don't you think they would like a 20% decrese in their power bill?
That means instead of building five power plants, you only need four (on a grand scale; please no newbie replies like doodz, thiz guy thinkz you n33d five pawer pl3nts to run a box). That is significant. And with today's high-MHz CPUs this means even more. Some think >50% savings, and even more during low cycle time.
The clock skew issue has been covered somewhat here. One of the major hurdles in solving the design problem is the development of new design tools, which is what many people at Utah are currently working on.
The way to move forward is not to argue for the limitations of systems of the past. Don't make me pull out Ken Olson quotes here.
It's not a dataflow machine (Score:4)
Well, it's not exactly a dataflow machine, anyway.
The old E&S machines were dataflow architectures at the equivalent of the "machine code" level. Newer architectures are using similar ideas, but in a way that does not require details of the dataflow model leeching outside the chip.
Look at the Pentium 3, for example. It exploits dataflow ideas at the microcode level by prefetching several machine code instructions, splitting them into a larger set of "micro-instructions" and then schedules them together. That's not really a dataflow architecture, but it does use ideas from it: the idea of deciding on how to schedule the instructions at run-time.
The new clockless CPUs will exploit dataflow ideas by implementing a kind of dataflow machine between the functional units of the CPU itself. The CPU, remember, is like an interpreter for machine code. Since the "program" for that interpreter does not change, it can be implemented in a "plugboard" kind of way and people or programs producing machine code will never know the difference, apart from speed.
Re:Several Issues (Score:2)
This design paradigm and sets of tools all assume synchronous logic.
That is a significant issue. Because of that, I'm not betting that my next CPU will be async. The first applications will probably be in the micro controler area where the chips tend to be simpler and less powerful, and where constraints on power consumption are tighter and more important. However, just as the industry made the transition from hand drawn schematics w/ discreet componants to ASICs, this too shall come to pass.
Imagine an automotive assembly line where things could only move forward if each station got permission from his adjacent stations.
To some extent, that's EXACTLY what happens in many line processes. To compensate, there are 'buffers' built into the system. Like with clocked logic, each step is designed to take about the same amount of time to complete so that there won't be pileups or starvation at any given station. Consider that in many line processes, some stations, especially at the input and output areas are human operated, and humans are very asynchronous, yet the line proceeds in an orderly manner overall.
In the case of CPUs, I imagine a RISC approach will probably be used where the instruction set is designed so that each instruction will take roughly the same amount of time (generally true today as well, but enforced by the clock). The compiler will be responsable for scheduling the instructions to avoid starvation and pileup. In cases where multiple identical units exist, there may even be routing bits in the instructions to choose which unit is employed. That's not a very big stretch from the current situation since good optimizing compilers already have to know about that process but don't get to make the choices.
Huh! (Score:2)
</sarcastic>
SInclair wanted to do this (Score:2)
It may be difficult, but... (Score:2)
Either we have to have separately clocked parts in smaller domains or we have to go asynchronous. Both are insanely difficult, but the latter have the possibility of generating speeds unheard of. There are transistors capapble to 250 GHz (not in CMOS Si-technology, but anyway) and with some reduction in the feedback, a back-fed inverter could generate 50-100 GHz, locally. Imagine small parts of a chip operating at that speed and using level-triggered handshaking... diffucult, but mindblowing. :-)
Another thigs, we would get rid of the power consumption. The CMOS is consuming power proportional to the frequency even if they are not doing anything. (At least the clocked parts) The asynchronous logic would not waste any charge in the on-off switches.... Some real power saving!
The next step is adiabatic calculations. After the logic has reached the result, the process is reversed with not energy- or charge loss.
However, the quantum computers will not happen during my life time. If ever.
Re:How about the human brain? (Score:2)
There is actually a sort of overall clock in the mind. I remember reading that brainwaves seem to be a kind of general clock signal that seems to be used for coordinating certain activities of conciousness. Sorry to be so vague. The article was awhile ago, and in print. I think in Scientific American.
That being said, I think you're right, the brain is largely asynchronous.
BTW, as a shameless plug, my StreamModule System [omnifarious.org] is also largely asynchronous. It's for IPC though, not for gate-logic.
First machines not clocked. This is old, old news (Score:5)
I'm also an old fart and not some software geek to whom every hardware technology mentioned is something unheard of before. That being said...
The first computing machines weren't synchronous. I forget the names, but this kind of thing was being done way back when because it was impractical to distribute a common clock across the racks and racks of equipment that made up a CPU back then.
Also, Motorola's PowerPC chips implement an asynchronous divider, so you might be using asynchronous technology right now.
The idea of having a computer run as fast as the transistors can go is a great goal, but there's some impractical aspects to the use of asynchronous circuits.
First, how do you know your computation is done? Well, there's several different ways of telling. You can use a current sensor to decide when your gates have settled out for a decent length of time or you can wait a predetermined amount of time based on worst case. All solutions involve bloating the design with more transistors to time the handshaking between Muller-C elements. Whether it's some type of current sensor or just inverter chains, there's at least 10% of a circuit tied up in timing (and it can run much, much higher).
Also, what do you do with the data once you've processed it so fast? The IOBs are only so quick in driving pins, so while the core of the design can run really stinkin' fast asynchronously, it's hampered by the ability to get data in and out.
Design verification is also a nightmare with asynchronous logic. It's a hard enough problem figuring out my longest path between registers across process and temperature variations, but to add in the factor of not knowing your clock is... well, icky.
Finally, what about noise in an asynchronous design? For my current work, I have to make sure everything happens synchronously... or I end up with nasty noise in my CCD section. I can tolerate a little bit of asynchronous behaviour, but not a lot.
Where asynchronous technology makes sense now is something like Motorola's divider circuit. By making it asynchronous, they gain the speed advantage of not having to rely on a slower, global clock distribution network, by making it a local function, they avoid the problem of slow IO, and by using it for a "small" amount of their design, they avoid die bloat and noise problems.
I guess the idea of asynchronous design boils down to one of history. If it's such a wonderful thing and has been around for so long, why doesn't everybody do it? Well, because it has drawbacks and the design philosophy rarely fits the design criteria (cost, tools, reliability, performance, and function).
I don't think this is a newsworthy item. In asynchronous design, it's pretty much ALL old hat. Academic papers recycle the same ideas and the UK email reflector for asynchronous "researchers" goes quiet for months at a time.
Maybe tomorrow, /. will report the discovery of fire.
Cockless chip unit (Score:3)
Given the absence of a clock, Id go for Inhertzia.
Karma karma karma karma karmeleon: it comes and goes, it comes and goes.
What will replace megahertz? (Score:2)
--
This isn't new at all. (Score:2)
The reason it has to do this is because both systems may be running at diferent speeds. If one's running at 100MHz and the other is at 102MHz then they will eventually get out of sync. Without a system like this they wouldn't even know they where out of sync. DSL/Cable/Modems all use systems like this. There are several hundred out there. Some of the mor common are Return to Zero (RZ) Non return to zero (NRZ) etc etc etc..
Back to electronics though, several types of memory are acually async. You set your addressing pins, you pulse a pin high wich tells the chip that its address is waiting. The chip then looks up the memory and sets the output data pins and then sets a return pin wich allows the other electronics to use the data returned. This allows for the fastest operation of the chip as posable. Since there is no clock there is no speed to conform to. The chip will always return its fastest posable responce. This is also harder to work with since you have to rate your chips in a speed rate. You may remember this from the old 70ns/60ns/50ns EDO/SIMM memory erra.
Re:How about the human brain? (Score:3)
The brain is an excellent example of parallell asynchronous computing, since a neuron will only fire when its input-treshold has been reached. However, many internal processes in the brain may in fact be more or less synchronous, due to the fact that maybe it's an evolutionary advantage
- Steeltoe
Re:Huh! (Score:2)
--
This sounds like a dataflow machine (Score:5)
Unfortunately for Sutherland, there's something called the PS300.
Back in the late 70's and early 80's, his company, Evans and Sutherland, ruled the world of computer graphics with their very slick Picture System machines. These were peripherals to PDP-11s and VAXes, and were wonderfully programmable machines. There was a fast interface between host memory and Picture System memory; letting you mess with the bits to your heart's content. We had a couple of them at NYIT's computer graphics lab; and did a lot of great animation with them.
E&S's next machine, though, was the PS300. This was a far more powerful machine, its first machine with a raster display. It was an advance in every way, except that it imposed a dataflow paradigm on programming the machine. You could only write programs by wiring up functional units. It was astonishingly difficult to write useful programs using this technology. Everybody I know that tried (and this was the early 80s, when people were used to having to work very hard to get anything on the screen at all); every one, gave up in frustration and disgust.
ILM got the most out of the machine; but that was by imposing their will on E&S to provide them with a fast direct link to the PS300's internal frame buffer.
Basically, dataflow ideas killed the PS300, which destroyed the advantage that E&S had as the pioneer graphics company, and they have never recovered from it. While the idea is charming, and to a hardware engineer it makes a lot of sense, programming them takes you back to the plugboard era of the very first WW-II machines. Nobody wants to do that.
thad
Re:Chip would still have a clock... (Score:2)
----
Re:This sounds like a dataflow machine (Score:3)
1) not all asynch-logic is dataflow. Dataflow is only the best known version.
2) There is (was) at least one decent dataflow language. Prograph. It's true, that it isn't a good fit to the machine it was targeted at (the Mac), and they didn't come up with a decent text-based printout, etc. (This last did make development hard... if you can't see a piece of code, it's hard to debug it. OTOH, they had an excellent visual debugger that could step through the code.)
The real problem with a new paradigm is convincing people to use it. For this one needs to find an entering wedge. Perhaps CPU design will be it for asynch logic (whether dataflow or not). Once it becomes established in a field, then it will have a chance to develop.
Please consider that asynch logic could be just what is needed to allow multi-processor machines to become useful. My real suspicion is that it will eventually end up as a mixed system, with certain pieces synch logic driven (e.g., screen drawing logic, sound generation, etc.) and other parts asynch. But that's probably a few decades away. (Perhaps.)
Caution: Now approaching the (technological) singularity.
Cornell Asynchronous Research (Score:3)
If anyone's interested, our group's page is:
http://vlsi.cornell.edu/
Anyone who wants a good overview of asynchronous design should read this paper:
http://vlsi.cornell.edu/~rajit/abstracts/async-
ck
Asynchronous vs. Synchronous (Score:3)
On the other hand, one more advantage that I haven't seen mentioned about asynchronous design is modularity - most synchronous designs can only be verified for correctness in the context of the global clock signal, whereas if you've verified the correctness of an asynchronous module, you can plug it in wherever its functionality fits, without having to adjust all the stuff around it.
When you think about it, however, you will note that synchronous design is actually just a SUBSET of asynchronous design - the clock signals are just a way of indicating a "data ready" condition to the next bunch of logic gates. Careful logic designers who hold this viewpoint can design hybrid synchronous/asynchronous designs, where the overall design is actually a bunch of smaller synchronous designs, where each block of synchronous logic receives a "clock" which is actually a data ready signal for the logic block as a whole.
Performance measurement (Score:2)
Caution: Now approaching the (technological) singularity.
old idea (Score:2)
Speeds get faster; chip dies get larger;
far of units get out of sync.
Mips, Flops, and lack of Clocks... (Score:5)
Not all Mips are created equal. For example: is it fair and reasonable to compare a CISC Mips to a RISC Mips? The CISC may be doing something like a string move with one instruction while the RISC machine does it with series of instructions in a loop. Obviously this is an apples an oranges comparison.
Okay - next you look at Flops - aren't Flops the same on every machine. Well - no, though that is probably less of an issue for comparing IEEE based implementations. The question comes up (and it has already been mentioned) that Flops don't compare useful work loads! The vast majority of computer work loads don't involve significant floating point operations. (Yes you can find workloads where that is the case - but it isn't the majority situation.)
So it comes down to comparing computer "systems" is a tricky business. Even Mhz in the same architecture family doesn't work because you don't know how efficiently the machine is designed -the hardware might be capable of greater than one instruction per clock!
Finally - I don't believe the estimate of upto 15 % or clock distribution. It's more like 1%-2%. ( I do chip design for a living..at least I have an educated opinion on this!) The clocks ARE a significant part of the power issue though. CMOS burns power when signals move. The clock moves. Simple enough analysis there.
Asynch design methods have been around forever, but present a number of problems for traditional design tools that depend on the clock to do their work. Further, there are alot of chip designers that throw up their hands if you just mention the word "asynchronous design" to them. Any push to this kind of design would be tramatic to say the least
AMULET (Score:2)
This CPU uses the ARM [arm.com] core.
It is so power-efficient that it could only rely on the induction power resulting from its pins transmitting information.
Current status specify delivery of the AMULET3i.
--
Re:Design Logic (Score:2)
An added thought to this is that since, according to the article, a lot of the research is being done of the Sun side, this will have interesting implications for the Wintel crowd.
It seems that it would make its' way into the market first via the UNIX crowd. This makes for interesting opportunities. The last two paragraphs of the article are interesting in this regard:
Mr. Sutherland, in fact, says a new magic is precisely what he has found. He draws an analogy to the first steel bridges, which were built like stone bridges, with arches. It took some time, he said, for designers to recognize the possibilities of the suspension bridge -- a form impossible to create with stone alone but which was perfectly suited to the properties of steel.
The same is true with asynchronous logic, he said. His research shows that it will be possible to double the switching speed of conventional clock-based circuits, he said, and he is confident that Sun in particular will soon begin to take advantage of that speed. "A 2X increase in speed makes a big difference," he said, "particularly if this is the only way to get that fast."
fascinating.
Different Kind of Clock (Score:2)
nononononono (Score:2)
This is not new or groundbreaking.
Re:How about the human brain? (Score:2)
------
Re:Don't over-mysticize ANNs... (Score:2)
That sounds like Alan Turing. He utilized an idealized, finite-state machine called a Turing Machine to prove a bunch of interesting theorems about logic. He conceived of the Turing Test by which a machine would prove that it thought by sucessfully convincing a remote human that it was a real person. Today, we probably wouldn't consider that as sufficient evidence of thought.
Re:How about the human brain? (Score:2)
That's because your brain is an analog computer, not a digital one. As for "if the brain can do it, it must be possible", that is simply not true. We are still in the dark ages as to what the brain actually does and how it actually does it; and we won't be able to use any of our discoveries in information processing technology for the forseeable future.
And no, artificial neural networks are NOT analogous to how the brain operates. It is most useful to think of them purely as mathematical creations. They are orders of magnitude simpler than the networks found in the brain; and their operations are at BEST guesswork.
Tri-state electronics (Score:2)
Well, what if the logic worked such that if any input was medium, the output was medium. Otherwise, the output was as it is now with binary logic. Then you could build a CPU that left units that weren't being used in the medium state. When an operation was performed, you would know when it was done as soon as the result didn't have medium bits.
Likewise, you could push this back on the memory and other subsystems.
Of course, now the question is whether adding the additional state is worth it in eliminating the clock.
Re:Units (not floating point operations) (Score:3)
I think in such a system, other features (code optimization, use of 3D accelerators, etc) will be more important than the speed of an add. It will even take several years of experimentation to determine what optimizations to make (how many times is it better to add than multiply, how should loops be unrolled, etc).
I think many traditional measurements will become worse than useless, and instead misleading. Since a lot of your repetative math operations may be unloaded on your 3D accelerator, it is questionable that, even if you could decide how to measure it, floating-point-operations per seconds would be a real indicator. I wouldn't want the manufacturer optimizing for that over other, useful things.
A better question is, how long does a NOP last? Won't this system optimize it out? How can you time a NOP without a clock?
The killer with asynchronous logic may be testing (Score:3)
If you have a chip where some of the units are slower than expected, you might get curious interactions and "race conditions" that are
very hard to test before you put the chip into
service.
Also, designing for asychronous logic has
been difficult - designing clocked and even
pipelined systems is a breeze compared to
dealing with asynchronous design. A lot of the
structured methods that have been developed for
conventional clocked circuits cannot be used,
and so designers have a lot of trouble
building complex systems.
Re:Asynchronous Logic (Score:3)
A little bit of a pain, but far from impossible. Anyone who works on software for a multithreaded, multiprocessor, or distributed environment solves asynchrony-related problems all the time. We do it by having locks instead of clocks; hardware folks can do and on occasion have done just the same. I'm sorry to hear that such basically simple problems are considered unsolvable by garden-variety EEs.
Re:Would this really work? (Score:2)
appart from that, it is basically an excersise in bookkeeping -- tag all values as belonging to a subinstruction, so that you are able to get the data dependencies right.
I could go on, but I think you get the idea. However, let me emphasize that the situation of the whole chip waiting on the slowest component is what we AVOID by going asynchonous, as this is exactly the reason why intel needs to pipeline so damn deep to get the clock rate up. They need to split the pipeline into steps small enought that each step can be done in one clock. Asynch circuitry wouldn't have that problem.
Re:This sounds like a dataflow machine (Score:2)
Sadly, much of this was a previously solved problem in the Control Data Corporation 6600 series of computer [geocities.com], which used a "scoreboard" to keep track of dynamically arising opportunities for parallelism among the CPU's functional units. This is generally the technology used in modern CPUs to infer parallelism from look-ahead in machine code instruction streams.
The fad in "dataflow" machines in the lat 70s and early 80s (arising largely from John Backus's 1977 Turing Award Lecture) was not entirely misguided. However, the failure to come up with a good way of describing I/O and other time-related operations was its downfall. I've been working on this stuff [geocities.com] from the viewpoint of distributed programming environments as a high-priority background task ever since those days, and it gets into some of the most serious philosophical questions about the relationship between mathematics and reality that are intimatedly related to quantum theory and phenomenology. At some point, we have to ask ourselves: 'What is an object, how does it come to "be" in "time" and how can we best formalize these conceptions?'
I don't think these problems have to be solved entirely for asynchronous systems to work, but my point of departure in trying to come up with a programming language that could handle dataflow was hardware design langauges, and the way they generalize boolean algebra to represent feedback circuits which means "time" in an important sense. Unfortunately, the best guys in Bell Labs at that time who were working on high level hardware design languages were using quite ad hoc formalisms to represent such boolean feedback loops.
No Login (Score:2)
Re:We don't need no steenkin' clocks! (Score:2)
True, without a clock, even chips from the same wafer could run at vastly different speeds. We see some of that effect from standard CPU builds - i.e. The old PMMX-166 -> 233MHz chips were all made at the same time, and marked according to what they could do (or how many of the slower ones they needed to sell). Basic qualification would be more extensive - no longer can you verify that a specific suite of tests completes with the chip at a certain clock speed, now you have to time the various ops and give each chip some rating based on that, and they wouldn't be very consistent. A much easier process when you only have a few choices to pick from, rather than a sliding scale.
Plus, the interleaving of the syncronous and async domains of the chip could be very interesting. PLLs are great when you have a few regions (say two different speed PCI busses into one chip, along with a memory bus at a third speed)... clocks gan gain you a lot in terms of simulation abilities, too. Tough to sim a chip when the process *really* makes a big difference in the delays.
--
FFT has its own problems. (Score:2)
well, to do a digital lowpass filter, you would use a FFT; i doubt that many commercial audio devices would use time-domain convolution when the FFT is faster
For one thing, it wouldn't necessarily be faster. Filtering with FFT is O(n log w) where w == window size; time-domain convolution is O(nm) where m == filter length. The hard edges of the FFT window creates artifacts that can be audible as a buzzing noise; this is why MP3 and Vorbis spend a few extra cycles on MDCT (an overlapped transform). Besides, you don't need a lot of taps; I know of a decent FOUR tap low-pass filter [11 19 5 -3]/32.
All your hallucinogen [pineight.com] are belong to us.
The homepage for the group (Score:5)
Re:Tri-state electronics (Score:3)
You shouldn't use the term "Tri-State" is this context. Tri-State, a copyright of National Semiconductor, means drivers that can be put into a high-impedence output mode and so be disconnected from a bus in a simple way. What you are referring to is called "Multiple Valued Logic" and has been researched forever. It has found it's way into a few products (ROMs most notably) but in general is more work than it's worth.
What's the point? (Score:2)
With all the suffering and poverty in the world we should really question whether some "scientists" deserve the money they get or whether those same funds could be utilised elswhere.
Will probably be unsuccessful (Score:2)
The effort sounds like a great science fair project. Above that, I can't see anything coming out of it to fruition.
So how do you measure speed? (Score:2)
Re:Will probably be unsuccessful (Score:3)
Re:Will probably be unsuccessful (Score:2)
Nothing new - Amulet, frex (Score:5)
Another advantage to power is also the speed; the clock speed isn't determined by the worse case of the most expensive instruction. (e.g. adding 0 and 1 can be done a lot quicker than adding (2^31)-1 and 1, because of no overflow)
Re:What's the point? Read the article! (Score:2)
For example, Royal Philips Electronics has built a pager using asynchronous electronics, taking advantage of the fact that the circuits produce far less radio interference than do clock-driven circuits. This makes it possible to operate a radio receiver that is directly next to the electronic circuit, greatly increasing the unit's operating efficiency.
Philips has also actively pursued research into asynchronous logic. Two small start-ups, Asynchronous Digital Design in Pasadena, Calif., and Theseus Logic in Orlando, Fla., are developing asynchronous chips for low-end consumer markets and high-performance computing systems.
How about the human brain? (Score:3)
I know it's a little late, but... (Score:2)
Someone mentioned CalTech - sorry, I forgot the cid, but thank you - so I went and did a little digging. Here [caltech.edu] is a link to the CalTech Asynchronous VLSI group. Right on the page are some cogent explanations of why they believe asynchronous designs will eventually become commonplace. Further in are pointers to some good papers, and an interesting discussion of their results implementing an asynchronous version of the MIPS R3000 architecture.
References, History, Technical Info (Score:2)
http://www.cs.man.ac.uk/async/background/return
A paper I found interesting on this subject:
http://www.ee.ic.ac.uk/pcheung/publications/Asy
Enjoy... I've heard Sutherland speak and he's done some very interesting things; most notably, he invented the method of "logical effort" for the sizing of transistors, without which CPUs would be several orders of magnitude harder to optimise than they are today.
Several Issues (Score:5)
First, most ASICs built these days are built with logic synthesis tools from Synopsys [synopsys.com] or Cadence [cadence.com]. The inputs are typically register transfer level (RTL) code written in either the VHDL or Verilog languages. These logic synthesis tools have been around for quite some time (well over a decade for Synopsys) and have a significant infrastructure built around them. This design paradigm and sets of tools all assume synchronous logic. I can't fathom how you would build/constrain/debug these circuits in an asynchronous style with the existing toolset. And don't say "we'll use something else". It is these types of tools which have made our million gate ASICs possible. If we were still using schematics or other hack tools we would barely have passed the 80286. The current design tools took a long time to develop, hone, and get the bugs out of. The amount of money involved in just the tools is on the order of billions of dollars per year. That's a lot of inertia to move away from.
Second, yes the asynchronous approach can reduce the power consumption of ASICs. However, there are a lot of clocked approaches that do a very good job of reducing power. It all depends on what goals you have when you design the ASIC. Having multiple clocks and clock gating is common in the low power and embedded domains. It hasn't been as much of a factor in desktop systems but is certainly in use in handheld devices. The Crusoe takes these approaches to an extreme level. It's all a matter of what you want to design for and time to market pressures.
Lastly, speed. I think folks forget the feedback path. If you're going to rely on this asynchronous handshake, it requires a given stage to hold its outputs until the next stage acknowledges (asynchronously) that it got the data. This means the given stage can't accept anything new yet. This cascades/ripples back through the pipeline. This feedback takes time (and logic levels) that don't exist in clocked logic. Imagine an automotive assembly line where things could only move forward if each station got permission from his adjacent stations. In clocked logic you've guaranteed that the data is ready to move forward because you've calculated these things out. You've removed a bunch of communication overhead. Yes, there is slack in the synchronous pipeline, but for the most part current designs are pretty well balanced so that each stage uses a large portion of its clock cycle.
That's about all I can think of at the moment. I need to be getting home before I get snowed in! ;-) Just a few comments from a digital hardware designer. Hope this provided some food for thought...
Re:Units (not floating point operations) (Score:3)
A MAC is a very important operation in digital signal processing. For example, to implement a digital lowpass filter (to remove tape hiss, for example), you define a finite impulse response filter (FIR filter) of some number of taps. You might need 256 taps to implement the needed low pass filter (this is a shot from the hip, the actual number of taps may be more or less). That means for every sample of audio (88.2kSamples/second for stereo audio) you need to do 256 MACs, or 22.6MFLOPS.
Clockspeed equivalent... (Score:2)
Happy now?
Re:This sounds like a dataflow machine (Score:2)
ASICs are hard enough to validate today with the
few async pieces we have to put into them.
Asych logic may look nice but unless we get some
major breakthroughs in verification tools, don't
look for it anywhere near the future.
Re: (just one of) Several Issues (Score:2)
IANAD (I am not a designer), but thinking about this I wonder if buffering interstage registers might not mitigate feedback path delay. Imagine three registers, R1 output of Stage 1, R2 buffer, and R3 input to Stage 2. Each register has a control bit (0 = read, 1 = unread). Further imagine two simple register to register copy circuits, one to copy R1 to R2, and a second to copy R2 to R3.
I apologize for the primitive exposition (I said IANAD), but intuitively it seems to me that such a buffer scheme could let logic stages overlap processing. The cost would be the time needed for the two hair-trigger copy operations between logic stages, but that should be minimal.
Bang1 - CopyR1R2 frees Stage1 to execute again, Bang2 - CopyR2R3 tells Stage2 it has an input.
If Stage1 completes a fast operation, the buffer copying lets it take on the next one (which might not be fast) perhaps before Stage2 is ready for its next input. Thus Stage1 and Stage2 can overlap in some circumstances, increasing overall speed. Multiply this by a dozen or so pipeline stages and the savings might be worth the effort.
Or perhaps this is the overhead the parent post was referring to...
The usual benchmark. (Score:2)
But seriously, even supposing we could come out with a retail clockless cpu, wouldn't it require a plethora of equally clockless peripherals like video cards and ide controllers and whatnot ? Otherwise it would need a clock to drive these "external" devices (external from the cpu's view, that is), and then we fall back into the same pit. The concept is fascinating but ill-fated I'm afraid.
Example asynchrounus CPUs (Score:4)
Asynchronous ARM core nears commercial debut (1998) [edtn.com]
ARM researches asynchronous CPU design (feb 1995) [computer-design.com]
AMULET3: A High-Performance Self-Timed ARM Microprocessor (1998) [ibm.com]