Forgot your password?
typodupeerror
Hardware News Technology

When Mistakes Improve Performance 222

Posted by kdawson
from the let's-change-everything dept.
jd and other readers pointed out BBC coverage of research into "stochastic" CPUs that allow communication errors in order to reap benefits in performance and power usage. "Professor Rakesh Kumar at the University of Illinois has produced research showing that allowing communication errors between microprocessor components and then making the software more robust will actually result in chips that are faster and yet require less power. His argument is that at the current scale, errors in transmission occur anyway and that the efforts of chip manufacturers to hide these to create the illusion of perfect reliability simply introduces a lot of unnecessary expense, demands excessive power, and deoptimises the design. He favors a new architecture, that he calls the 'stochastic processor,' which is designed to handle data corruption and error recovery gracefully. He believes he has shown such a design would work and that it would permit Moore's Law to continue to operate into the foreseeable future. However, this is not the first time someone has tried to fundamentally revolutionize the CPU. The Transputer, the AMULET, the FM8501, the iWARP, and the Crusoe were all supposed to be game-changers but died cold, lonely deaths instead — and those were far closer to design philosophies programmers are currently familiar with. Modern software simply isn't written with the level of reliability the stochastic processor requires (and many software packages are too big and too complex to port), and the volume of available software frequently makes or breaks new designs. Will this be 'interesting but dead-end' research, or will Professor Kumar pull off a CPU architectural revolution really not seen since the microprocessor was designed?"
This discussion has been archived. No new comments can be posted.

When Mistakes Improve Performance

Comments Filter:
  • Re:Moore's law (Score:4, Informative)

    by takev (214836) on Saturday May 29, 2010 @05:49PM (#32392632)
    What he is proposing is to reduce the number of transistors on a chip, to increase its speed and reduce power usage.
    So in fact he is trying to reverse more's law.
  • by DigiShaman (671371) on Saturday May 29, 2010 @05:49PM (#32392640) Homepage

    From what I understand, all modern processors are now a hybrid of both RISK and CISC (Intel Core 2, AMD K8, etc). Except for embedded applications, the generic CPU doesn't have that kind of pure classification anymore. Right?

  • by xZgf6xHx2uhoAj9D (1160707) on Saturday May 29, 2010 @06:05PM (#32392786)

    The classifications weren't totally meaningful to begin with, but CISC has essentially died. I don't mean there aren't CISC chips anymore--any x86 or x64 chip can essentially be called "CISC"--but certainly no one's designed a CISC architecture in the past decade at least.

    RISC has essentially won and we've moved into the post-RISC world as far as new designs go. VLIW, for instance, can't really be considered classical RISC, but it builds on what RISC has accomplished.

    The grandparent's point is a good one: people thought RISC would never succeed; they were wrong.

  • Re:Impossible design (Score:4, Informative)

    by AmiMoJo (196126) <mojo@NOspAm.world3.net> on Saturday May 29, 2010 @06:15PM (#32392866) Homepage

    The first thing to say is that we are not talking about general purpose CPU instructions but rather the highly repetitive arithmetic processing that is needed for things like video decoding or 3D geometry processing.

    The CPU can detect when some types of error occur. It's a bit like ECC RAM where one or two bit errors can be noticed and corrected. It can also check for things like invalid op-codes, jumps to invalid or non-code memory and the like. If a CPU were to have two identical ALUs it could compare results.

    Software can also look for errors in processed data. Things like checksums and estimation can be used.

    In fact GPUs already do this to some extent. AMD and nVidia's workstation cards are the same as their gaming cards, the only difference being that the workstation ones are certified to produce 100% accurate output. If a gaming card colours a pixel wrong every now and then it's no big deal and the player probably won't even notice. For CAD and other high end applications the cards have to be correct all the time.

  • by xZgf6xHx2uhoAj9D (1160707) on Saturday May 29, 2010 @06:21PM (#32392924)

    Error Correction Codes (aka Forward Error Correction) are typically more efficient for high-error channels than error detection (aka checksum and retransmit), which is why 10Gbps Ethernet uses Reed-Solomon rather than CRC in previous Ethernet standards: it avoids the need to retransmit.

    I had the same questions about how this is going to work, though. What is the machine code going to look like and how will it allow the programmer to check for errors? Possibly each register could have an extra "error" bit (similarly to IA-64's NaT bit on its GP registers). E.g., if you do an "add" instruction, it checks the error bits on its source operands and propagates them. So long as you only allow false positives and false negatives, it would work, and could be relatively efficient.

  • Re:Impossible design (Score:1, Informative)

    by Anonymous Coward on Saturday May 29, 2010 @06:29PM (#32392972)

    Reduced power OR equal power at a faster clock rate. Many times speed is preferred to accuracy when perfection isn't necessary. Video and audio are good examples already doing this (e.g. dropped frames on slow connections).

  • by BartholomewBernsteyn (1720348) on Saturday May 29, 2010 @06:44PM (#32393070)
    This may be a far thought, but if stochastic CPUs allow for increased performance in a trade-off for correctness, maybe something like following description may reap the benefits while keeping out the stochastics right away:
    Suppose those CPUs really allow for faster instruction handling using less resources, maybe you could put more in a package, for the same price, which on a hardware level would give rise to more processing cores at the same cost. (Multi-Core stochastic CPUs)
    Naturally, you have the ability to do parallel processing, with errors possible, but you are able to process instructions at a faster rate.
    On the software side, the support for concurrency is a mayor selling point, of course, there has to be something able recover from those pesky stochastics gracefully. I come up with the functional language 'Erlang'.
    This is taken from wikipedia

    Concurrency supports the primary method of error-handling in Erlang. When a process crashes, it neatly exits and sends a message to the controlling process which can take action. This way of error handling increases maintainability and reduces complexity of code

    From the official source:

    Erlang has a built-in feature for error handling between processes. Terminating processes will emit exit signals to all linked processes, which may terminate as well or handle the exit in some way. This feature can be used to build hierarchical program structures where some processes are supervising other processes, for example restarting them if they terminate abnormally.

    Asked to 'refer to OTP Design Principles for more information about OTP supervision trees, which use[s] this feature' I read this:

    A basic concept in Erlang/OTP is the supervision tree. This is a process structuring model based on the idea of workers and supervisors. Workers are processes which perform computations, that is, they do the actual work. Supervisors are processes which monitor the behaviour of workers. A supervisor can restart a worker if something goes wrong. The supervision tree is a hierarchical arrangement of code into supervisors and workers, making it possible to design and program fault-tolerant software.

    This seems well fit? Create a real, physical machine for a language both able to reap its benefits and cope with the trade-off.
    Or maybe I'm too far off (I'm bored technologically, allow me some paradigmatic change at slashdot).

    TamedStochastics - Hiring.

    Yes, checksumming on dedicated hardware was my first thought as well.

  • Not really (Score:4, Informative)

    by Sycraft-fu (314770) on Saturday May 29, 2010 @08:46PM (#32393836)

    Ethernet has lower latency than token ring, and is over all easier to implement. However its bandwidth scaling is poor, when you are talking old school hub ethernet. The more devices you have, the less total throughput you get due to collisions. Eventually you can grind a segment to a complete halt because collisions are so frequent little data gets through.

    Modern ethernet does not have collisions. It is full duplex, using separate transmit and receive wires. It scales well because the devices that control it, switches, handle sending data where it needs to go and can do bidirectional transmission. The performance you see with gig ethernet is only possible because you do full duplex communications with essentially no errors.

  • by jasonwc (939262) on Saturday May 29, 2010 @10:06PM (#32394176)

    The problem isn't really modern CPUs but the lack of improvement in conventional hard drive speeds. With a Core i7 processor and a 160 GB X-25M Gen 2 Intel SSD, pretty much everything I run loads within 1-2 seconds, and there is little or no difference between loading apps from RAM or my hard drive. Even with a 1 TB WD Caviar Black 7200 RPM drive, my Core i7 machine was constantly limited by the hard drive.

    With an SSD, I boot to a usable desktop and can load Firefox with Adblock, Pidgin, Skype, Foobar2000 and Word in around 2 seconds. Many programs like Chrome load so quickly that they are effectively instant-on. Even though quad-core processors are often derided for desktop use, I see a tremendous improvement with a Core i7 + high-performance SSD vs. a Core 2 Duo + mediocre laptop drive. Modern CPUs can make your desktop experience much more responsive. You just need a hard drive that can keep up.

    Oh, and in video playback, the difference is incredibly obvious. My roommate is still using a 7 year old laptop which can barely playback a DVD (MPEG-2). In contrast, my Core i7 can simultaneously decode 5 1080p H.264 videos with ease (after this point, the hard drives can't keep up). While this might be considered useless, it definitely makes a difference when running background tasks such as backups. With my Core 2 Duo without hardware decoding, I would have to pause when scheduled backups started or video would skip. With my quad-core system, I can run any task in the background without fear of slowdown, and also use high-quality upscale filters and renderers that would have slowed my dual-core system to a crawl.

    Too many people claim that modern processors and hardware do not provide meaningful improvements to the desktop experience. I just don't find this to be true. Multi-core processors have allowed users to run background tasks, install software etc. with no noticeable speed degradation. When I am working with old single-core machines, I miss this benefit.

    In addition, today's software is more powerful. You may not need all the features in Word 2007 or the latest Firefox build, but that doesn't mean they aren't useful.

    Adblock, Flashblock, Session Management, the ability to have dozens of tabs loaded without memory issues, the ability to stream high definition video in my browser with no or minimal buffering, the "Awesomebar" etc. are all features that didn't exist 5+ years ago.

    Real-time indexing of system files and applications is relatively recent, and yet I find that it has fundamentally transformed how I access data.

    There are many more examples. It may be popular to say that things haven't changed much in two decades, because word processors are superficially similar for example, but a great deal has changed.

  • Re:Impossible design (Score:3, Informative)

    by DavidRawling (864446) <hulk_@NOSpAM.yahoo.com> on Saturday May 29, 2010 @10:32PM (#32394280)

    space the instructions further apart so that one or two bit flips won't map to another instruction.

    Yeah - I think you left out the thinking bit before your comment.

    Sure, a single bit flip in the least significant bit only moves you 1 byte forward or backward in RAM. But in the most significant bit in a 32 bit CPU it moves you 2GB away (let alone the 8.4 billion GB in a 64 bit CPU, if my mental maths is correct).

    Just how far apart do you want the instructions?

God doesn't play dice. -- Albert Einstein

Working...