Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Hardware News Technology

When Mistakes Improve Performance 222

jd and other readers pointed out BBC coverage of research into "stochastic" CPUs that allow communication errors in order to reap benefits in performance and power usage. "Professor Rakesh Kumar at the University of Illinois has produced research showing that allowing communication errors between microprocessor components and then making the software more robust will actually result in chips that are faster and yet require less power. His argument is that at the current scale, errors in transmission occur anyway and that the efforts of chip manufacturers to hide these to create the illusion of perfect reliability simply introduces a lot of unnecessary expense, demands excessive power, and deoptimises the design. He favors a new architecture, that he calls the 'stochastic processor,' which is designed to handle data corruption and error recovery gracefully. He believes he has shown such a design would work and that it would permit Moore's Law to continue to operate into the foreseeable future. However, this is not the first time someone has tried to fundamentally revolutionize the CPU. The Transputer, the AMULET, the FM8501, the iWARP, and the Crusoe were all supposed to be game-changers but died cold, lonely deaths instead — and those were far closer to design philosophies programmers are currently familiar with. Modern software simply isn't written with the level of reliability the stochastic processor requires (and many software packages are too big and too complex to port), and the volume of available software frequently makes or breaks new designs. Will this be 'interesting but dead-end' research, or will Professor Kumar pull off a CPU architectural revolution really not seen since the microprocessor was designed?"
This discussion has been archived. No new comments can be posted.

When Mistakes Improve Performance

Comments Filter:
  • Impossible design (Score:4, Interesting)

    by ThatMegathronDude ( 1189203 ) on Saturday May 29, 2010 @05:24PM (#32392416)
    If the processor goofs up the instructions that its supposed to execute, how does it recover gracefully?
    • Re: (Score:3, Funny)

      by Anonymous Coward

      The Indian-developed software will itself fuck up in a way that negates whatever fuck up just happened with the CPU. In the end, it all balances out, and the computation is correct.

    • by TheThiefMaster ( 992038 ) on Saturday May 29, 2010 @05:42PM (#32392562)

      Especially a JMP (GOTO) or CALL. If the instruction is JMP 0x04203733 and a transmission error makes it do JMP 0x00203733 instead, causing it to attempt to execute data or an unallocated memory page, how the hell can it recover from that? It could be even worse if the JMP instruction is changed only subtly, jumping only a few bytes too far or too close could land you the wrong side of an important instruction that throws off the entire rest of the program. All you could do is to detect the error/crash and restart from the beginning and hope. What if the error was in your error detection code? Do you have to check the result of your error detection for errors too?

    • Re: (Score:3, Interesting)

      by Anonymous Coward

      Thats a good point. You accept mistakes with the data, but don't want the operation to change from add (where, when doing large averages plus/minus a few hundreds wont matter) to multiply or divide.

      But once you have the opcode separated from the data, you can mess with the former. E.g. not care when something is a race condition because that happening every 1000th operation doesn't matter too much.
      And as this is a source of noise, you just got a free random data!
      Still, this looks more like something for sci

      • There area plenty of other ideas to deal with noisy chips.. I'd point out DARPA's SyNAPSE [darpa.mil] program as an example. Due to quantum constraints, the future of deterministic computation must eventually deal with the noise in a robust manner. The above efforts are focusing on memristor [nature.com] technology.

        I don't know whether stochastic architectures do better than noisy memristor ones, but either way we'll have to learn how to program in an environment that the least predictable element is not the one at the console.

    • by Anonymous Coward

      More importantly, if the software is more robust so as to detect and correct errors, then it will require more clock cycles of the CPU and negate the performance gain.

      This CPU design sounds like the processing equivalent of a perpetual motion device. The additional software error correction is like the friction that makes the supposed gain impossible.

    • Basically he's saying we should trade power consumption for accuracy? Hmmm ... I vote 'No'.
      • When playing back a movie on my iPhone I don't care if pixel 200x200 is 0xFFFFFF or 0xFFFFFE. My brain can't tell the difference.

    • Re:Impossible design (Score:4, Informative)

      by AmiMoJo ( 196126 ) on Saturday May 29, 2010 @06:15PM (#32392866) Homepage Journal

      The first thing to say is that we are not talking about general purpose CPU instructions but rather the highly repetitive arithmetic processing that is needed for things like video decoding or 3D geometry processing.

      The CPU can detect when some types of error occur. It's a bit like ECC RAM where one or two bit errors can be noticed and corrected. It can also check for things like invalid op-codes, jumps to invalid or non-code memory and the like. If a CPU were to have two identical ALUs it could compare results.

      Software can also look for errors in processed data. Things like checksums and estimation can be used.

      In fact GPUs already do this to some extent. AMD and nVidia's workstation cards are the same as their gaming cards, the only difference being that the workstation ones are certified to produce 100% accurate output. If a gaming card colours a pixel wrong every now and then it's no big deal and the player probably won't even notice. For CAD and other high end applications the cards have to be correct all the time.

      • OpenCL (Score:3, Interesting)

        by tepples ( 727027 )

        AMD and nVidia's workstation cards are the same as their gaming cards, the only difference being that the workstation ones are certified to produce 100% accurate output. If a gaming card colours a pixel wrong every now and then it's no big deal and the player probably won't even notice.

        As OpenCL and other "abuses" of GPU power become more popular, "colors a pixel wrong" will eventually happen in the wrong place at the wrong time on someone using a "gaming" card.

    • by Chowderbags ( 847952 ) on Saturday May 29, 2010 @06:19PM (#32392908)
      Moreover, if the processor goofs on the check, how will the program know? Do we run every operation 3 times and take the majority vote (then we've cut down to 1/3rd of the effective power)? Even if we were to take the 1% error rate, given that each core of CPUs right now can run billions of instructions per second, this CPU will fail to check correctly every second (even checking, rechecking, and checking again every single operation). And what about memory operations? Can we accept errors in a load or store function? If so, we can't in practice trust our software to do what we tell it. (change a bit on load and you could do damn near anything from adding the wrong number, to saying an if statement is true when it should be false, to not even running the right fricken instruction.

      There's a damn good reason why we want our processors to be rock solid. If they don't work right, we can't trust anything they output.
      • by pipatron ( 966506 ) <pipatron@gmail.com> on Saturday May 29, 2010 @08:04PM (#32393618) Homepage

        Not very insightful. You seem to say that a CPU today is error-free, and if this is true, the part of the new CPU that does the checks could also be made error-free so there's no problem.

        Well, they aren't rock-solid today either, so you can not trust their output even today. It's just not very likeley that there will be a mistake. This is why mainframes execute a lot of instructions at least twice, and decides on-the-fly if something went wrong. This idea is just an extension of that.

        • by Belial6 ( 794905 )
          Correct, another method would be to take the double entry accounting approach. You run the command in two different ways that should provide the same answer if correct, but different answers if wrong. You would only need a very small part of the chip to be really reliable as an error checker. I do think that this would be better handled by hardware than software, but the premise is not unreasonable.

          The real question is whether chips could be sped up enough to counteract the slow down introduced by err
        • There's a huge gap between current chips correcting errors in general long before they propagate up to to userland and what this article is talking about. This researcher is talking about creating more "robust" software "so an error simply causes the execution of instructions to take longer". If they're talking about the microcode on the CPU itself (I doubt it), then this is nothing new. If they're talking about code in every end developers program, than they fall into exactly the problem I describe (along
        • by Jeremi ( 14640 )

          Well, they aren't rock-solid today either, so you can not trust their output even today. It's just not very likely that there will be a mistake.

          For common definitions of "rock-solid" and "not very likely", the above statements cancel each other out. (Keep in mind that different markets have different requirements for reliability... 1 hardware error every year is probably acceptable for casual computing use, but not for nuclear reactor control)

          • Re: (Score:2, Offtopic)

            by jd ( 1658 )

            1 hardware error every year is probably acceptable for casual computing use, but not for nuclear reactor control

            Someone should have told British Nuclear Fuel. I think Windscale/Selafield was up to 20 accidental nuclear waste discharges a year at one point.

      • Re: (Score:3, Interesting)

        There's a damn good reason why we want our processors to be rock solid. If they don't work right, we can't trust anything they output.

        Have you ever tried transferring large files over a 100 MBps ethernet link? Thats right, billions of bytes over a noisy, unreliable wired link. And how often have you seen files corrupted? I never have. The link runs along extremely reliably (BER of 10^-9 I think) with as little as 12MBps out of the 100MBps spent on error checking and recovery.

        Same case here. I'd expect the signal-to-noise ratio on the connects within CPUs (when the voltage is cut by say 25%) to be similar, if not better, than ethernet l

  • Wouldn't it be simpler to simply add redundancy and CRC or something similar to that effect?

  • by Red Jesus ( 962106 ) on Saturday May 29, 2010 @05:35PM (#32392494)

    The "robustification" of software, as he calls it, involves re-writing it so an error simply causes the execution of instructions to take longer.

    Ooh, this is tricky. So we can reduce CPU power consumption by a certain amount if we rewrite software in such a way that it can slowly roll over errors when they take place. There are some crude numbers in the document: a 1% error rate, whatever that means, causes a 23% drop in power consumption. What if the `robustification' of software means that it has an extra "check" instruction for every three "real" instructions? Now you're back to where you started, but you had to rewrite your software to get here. I know, it's unfair to compare his proven reduction in power consumption with my imaginary ratio of "check" instructions to "real" instructions, but my point still stands. This system may very well move the burden of error correction from the hardware to the software in such a way that there is no net gain.

    • Re: (Score:2, Insightful)

      by sourcerror ( 1718066 )

      This system may very well move the burden of error correction from the hardware to the software in such a way that there is no net gain.

      People said the same about RISC processors.

      • Re: (Score:3, Informative)

        Comment removed based on user account deletion
        • Re: (Score:3, Informative)

          The classifications weren't totally meaningful to begin with, but CISC has essentially died. I don't mean there aren't CISC chips anymore--any x86 or x64 chip can essentially be called "CISC"--but certainly no one's designed a CISC architecture in the past decade at least.

          RISC has essentially won and we've moved into the post-RISC world as far as new designs go. VLIW, for instance, can't really be considered classical RISC, but it builds on what RISC has accomplished.

          The grandparent's point is a good one:

        • RISK

          They try to invade Kamchatka?

    • Re: (Score:3, Interesting)

      by Turzyx ( 1462339 )
      I'm making assumptions here, but if these errors are handled by software would it not be possible for a program to 'ignore' errors in certain circumstances? Perhaps this could result in improved performance/battery life for certain low priority tasks. Although an application where 1% error is acceptable doesn't spring immediately to mind, maybe supercomputing - where anomalous results are checked and verified against each other...?
    • I can see this possibly working, though the devil is in the details. First, consider a similar situation with a communications link. You could either send every byte twice (TI 99/4A cassette format, I'm looking at you!), or if the error rate isn't too high, checksum large blocks of data and retransmit if there's an error. The latter will usually yield a higher rate for the error-free channel you create the illusion of. So if you could break a computation into blocks and somehow detect a corrupt computation,
      • Re: (Score:3, Informative)

        Error Correction Codes (aka Forward Error Correction) are typically more efficient for high-error channels than error detection (aka checksum and retransmit), which is why 10Gbps Ethernet uses Reed-Solomon rather than CRC in previous Ethernet standards: it avoids the need to retransmit.

        I had the same questions about how this is going to work, though. What is the machine code going to look like and how will it allow the programmer to check for errors? Possibly each register could have an extra "error" bit (

        • by jd ( 1658 )

          There are many types of error correction codes, and which one you use depends on the nature of the errors. For example, if the errors are totally random, then Reed-Solomon will likely be the error correction code used. CDs use two layers of Reed-Solomon concatenated in series. This is not especially fast, but the output for a CD is in tens of kiloherts and an ASIC could easily be operating in the gigahertz realms. However, when you're talking the internals of a CPU, there's a limit to how slow you can affor

      • by Jeremi ( 14640 )

        You could either send every byte twice (TI 99/4A cassette format, I'm looking at you!)

        Seems to me you'd have to send every byte three times, otherwise if two bytes don't match, how do you know one is correct and which one is corrupted?

        (on the other hand, this would help explain why TI 99/4A cassette loads were so bloody slow...)

    • Re: (Score:3, Interesting)

      by JoeMerchant ( 803320 )
      Why rewrite the application software? Why not catch it in the firmware and still present a "perfect" face to the assembly level code? Net effect would be an unreliable rate of execution, but who cares about that if the net rate is faster?
    • This may be a far thought, but if stochastic CPUs allow for increased performance in a trade-off for correctness, maybe something like following description may reap the benefits while keeping out the stochastics right away:
      Suppose those CPUs really allow for faster instruction handling using less resources, maybe you could put more in a package, for the same price, which on a hardware level would give rise to more processing cores at the same cost. (Multi-Core stochastic CPUs)
      Naturally, you have the abi
      • by jd ( 1658 )

        Oh, I agree with what you've said, but I have a very hard time believing people will be porting Linux, OpenBSD or Windows to Erlang any time soon, let alone take advantage of all the capabilities of Erlang. It could be done, in principle, but in practice the codebase is a serious problem. As I mentioned in my submission, one of the previous attempts to change the way CPUs were designed was the Transputer. It died, not because of any flaw in the design (which was superb) but because training everyone in Occa

    • by Interoperable ( 1651953 ) on Saturday May 29, 2010 @07:34PM (#32393432)

      I did some digging and found some material by the researcher, unfiltered by journalists. I don't have any background in processor architecture but I'll present what I understood. The original publications can be found here [illinois.edu].

      The target of the research is not general computing, but rather low-power "client-side" computing, as the author puts it. I understand this to be decoding application, such as voice or video in mobile devices. Furthermore, the entire architecture would not be stochastic, but rather it would contain some functional blocks that are stochastic. I think the idea is that certain mobile hardware devices devote much of their time to specialized applications that do not require absolute accuracy.

      A mobile phone may spend most of it's time being used encode/decode low resolution voice and video and would have significant blocks within the processor devoted to those tasks. Those tasks could be considered error tolerant. The operating system would not be exposed to error-prone hardware, only applications that use hardware acceleration for specialized, error-tolerant tasks. In fact, the researchers specifically mention encoding/decoding voice and video and have demonstrated the technique on encoding h.264 video.

  • Sounds like... (Score:2, Offtopic)

    by Chineseyes ( 691744 )
    Sounds like Kumar and his friend Harold have been spending too much time baking weed brownies and not silicon wafers.
  • I don't see how allowing a higher error rate will enable them to put more transistors on a chip.

    • Re:Moore's law (Score:4, Informative)

      by takev ( 214836 ) on Saturday May 29, 2010 @05:49PM (#32392632)
      What he is proposing is to reduce the number of transistors on a chip, to increase its speed and reduce power usage.
      So in fact he is trying to reverse more's law.
    • by Jeremi ( 14640 )

      I don't see how allowing a higher error rate will enable them to put more transistors on a chip.

      What it does is increase the chances of actually being able to use the chip once it's made. Right now, many chips have to be discarded because they contain manufacturing flaws that (in current designs) makes them unusable. If they can come up with a design that allows flawed chips to be useful anyway, they no longer have to discard all the chips that didn't come out perfectly.

  • by bug1 ( 96678 ) on Saturday May 29, 2010 @05:59PM (#32392738)

    Ethernet is an improvement over than token ring, yet Ethernet has collisions and token ring doesn't.

    Token ring avoids collisions, Ethernet accepts collisions will take place but has a good error recovery system.

    • by h4rr4r ( 612664 )

      No, they are totally separate things. You can run token ring over Ethernet, been there done that. Ethernet does use a bus topology but these days we use switches to avoid collisions.

    • Not really (Score:4, Informative)

      by Sycraft-fu ( 314770 ) on Saturday May 29, 2010 @08:46PM (#32393836)

      Ethernet has lower latency than token ring, and is over all easier to implement. However its bandwidth scaling is poor, when you are talking old school hub ethernet. The more devices you have, the less total throughput you get due to collisions. Eventually you can grind a segment to a complete halt because collisions are so frequent little data gets through.

      Modern ethernet does not have collisions. It is full duplex, using separate transmit and receive wires. It scales well because the devices that control it, switches, handle sending data where it needs to go and can do bidirectional transmission. The performance you see with gig ethernet is only possible because you do full duplex communications with essentially no errors.

  • With all the mistakes I've made, I could be a superman by now.

  • A brainy idea. (Score:5, Interesting)

    by Ostracus ( 1354233 ) on Saturday May 29, 2010 @06:28PM (#32392962) Journal

    He favors a new architecture, that he calls the 'stochastic processor,' which is designed to handle data corruption and error recovery gracefully.

    I dub thee neuron.

    • Indeed. It couldn't be used with traditional programming methods, you'd only be able to use it with statistical methods.

      Genetic programming maybe. Errors are mutations.
       

  • by Angst Badger ( 8636 ) on Saturday May 29, 2010 @06:31PM (#32392986)

    ...the problem is software. In the last twenty years, we've gone from machines running at a few MHz to multicore, multi-CPU machines with clock speeds in the GHz, with corresponding increases in memory capacity and other resources. While the hardware has improved by several orders of magnitude, the same has largely not been true of software. With the exception of games and some media software, which actually require and can use all the hardware you can throw at them, end user software that does very little more than it did twenty years ago could not even run on a machine from 1990, much less run usably fast. I'm not talking enterprise database software here, I'm talking about spreadsheets and word processors.

    All of the gains we make in hardware are eaten up as fast or faster than they are produced by two main consumers: useless eye-candy for end users, and higher and higher-level programming languages and tools that make it possible for developers to build increasingly inefficient and resource-hungry applications faster than before. And yes, I realize that there are irresistible market forces at work here, but that only applies to commercial software; for the FOSS world, it's a tremendous lost opportunity that appears to have been driven by little more than a desire to emulate corporate software development, which many FOSS developers admire for reasons known only to them and God.

    It really doesn't matter how powerful the hardware becomes. For specialist applications, it's still a help. But for the average user, an increase in processor speed and memory simply means that their 25 meg printer drivers will become 100 meg printer drivers and their operating system will demand another gig of RAM and all their new clock cycles. Anything that's left will be spent on menus that fade in and out and buttons that look like quivering drops of water -- perhaps next year, they'll have animated fish living inside them.

    • And yes, I realize that there are irresistible market forces at work here, but that only applies to commercial software; for the FOSS world, it's a tremendous lost opportunity that appears to have been driven by little more than a desire to emulate corporate software development, which many FOSS developers admire for reasons known only to them and God.

      I think I know why. If free software lacks eye candy, free software has trouble gaining more users. If free software lacks users, hardware makers won't cooperate, leading to the spread of "paperweight" status on hardware compatibility lists. And if free software lacks users, there won't be any way to get other software publishers to document data formats or to get publishers of works to use open data formats.

    • Higher level languages aren't just there to save developers time. Using higher level languages usually makes it harder to generate code that will walk on protected memory, cause race conditions etc., and higher level languages are usually more portable and make it easier to write modular re-usable code.
    • by Draek ( 916851 ) on Saturday May 29, 2010 @08:02PM (#32393604)

      for the FOSS world, it's a tremendous lost opportunity that appears to have been driven by little more than a desire to emulate corporate software development, which many FOSS developers admire for reasons known only to them and God.

      You yourself stated that high-level languages allow for a much faster rate of development, yet you dismiss the idea of using them in the F/OSS world as a mere "desire to emulate corporate software development"?

      Hell, you also forgot another big reason: high-level code is almost always *far* more readable than its equivalent set of low-level instructions, the appeal of which for F/OSS ought to be similarly obvious.

      Sorry but no, the reason practically the whole industry has been moving towards high-level languages isn't because we're all lazy, and if you worked in the field you'd probably know why.

    • by Homburg ( 213427 ) on Saturday May 29, 2010 @08:13PM (#32393676) Homepage

      So you're posting this from Mosaic, I take it? I suspect not, because, despite your "get off my lawn" posturing, you recognize in practice that modern software actually does do more than twenty-year-old software. Firefox is much faster and easier to use than Mosaic, and it also does more, dealing with significantly more complicated web pages (like this one; and terrible though Slashdot's code surely is, the ability to expand comments and comment forms in-line is a genuine improvement, leaving aside the much more significant improvements of something like gmail). Try using an early 90s version of Word, and you'll see that, in the past 20 years word processors, too, have become significantly faster, easier to use, and capable of doing more (more complicated layouts, better typography).

      Sure, the laptop I'm typing this on now is, what, 60 times faster than a computer in 1990, and the software I'm running now is neither 60 times faster nor 60 times better than the software I was running in 1990. But it is noticeably faster, at the same time that it does noticeably more and is much easier to develop for. The idea that hardware improvements haven't led to huge software improvements over the past 20 years can only be maintained if you don't remember what software was like 20 years ago.

      • So you're posting this from Mosaic, I take it? I suspect not, because, despite your "get off my lawn" posturing, you recognize in practice that modern software actually does do more than twenty-year-old software.

        [X] I post using telnet, you inconsiderate clod! Now wget off my lawn!

        the ability to expand comments and comment forms in-line is a genuine improvement

        You're kidding, right> I just leave it in nested mode - far fewer clicks.

      • by jd ( 1658 )

        I would dispute the "capable of doing more" part. TeX can do anything the latest version of Word can do. Ventura Publisher from 20 years back could probably do just about everything. Word IS faster and easier to use, yes, but unless you compare Word to WordStar or Wordcraft 80, it is generally a mistake to look at what something can do. If a program supports Turing-Complete macros, then it can do absolutely anything a Turing Machine can do (given sufficient memory) and a Turing Machine can do anything that

    • by bertok ( 226922 )

      All of the gains we make in hardware are eaten up as fast or faster than they are produced by two main consumers: useless eye-candy for end users, and higher and higher-level programming languages and tools that make it possible for developers to build increasingly inefficient and resource-hungry applications faster than before. And yes, I realize that there are irresistible market forces at work here, but that only applies to commercial software; for the FOSS world, it's a tremendous lost opportunity that appears to have been driven by little more than a desire to emulate corporate software development, which many FOSS developers admire for reasons known only to them and God.

      This is a common misconception in the computing world: that somehow the additional computing power is 'wasted' on 'bloat'.

      The basic principle that one has to understand is that in the meantime, human beings haven't changed. Our brains haven't improved in speed. There is no benefit to us if a program responds in 1 microsecond instead of 1 millisecond.

      However, in terms of 'features', programs are still far behind where they should be. There's an awful lot that we as programmers could do that we aren't, either

    • Re: (Score:3, Informative)

      by jasonwc ( 939262 )

      The problem isn't really modern CPUs but the lack of improvement in conventional hard drive speeds. With a Core i7 processor and a 160 GB X-25M Gen 2 Intel SSD, pretty much everything I run loads within 1-2 seconds, and there is little or no difference between loading apps from RAM or my hard drive. Even with a 1 TB WD Caviar Black 7200 RPM drive, my Core i7 machine was constantly limited by the hard drive.

      With an SSD, I boot to a usable desktop and can load Firefox with Adblock, Pidgin, Skype, Foobar2000 a

    • Re: (Score:3, Insightful)

      by rdnetto ( 955205 )

      a desire to emulate corporate software development, which many FOSS developers admire for reasons known only to them and God.

      Probably because the major FOSS developers are in corporate software development.

    • by Jeremi ( 14640 )

      Anything that's left will be spent on menus that fade in and out and buttons that look like quivering drops of water -- perhaps next year, they'll have animated fish living inside them.

      If animated fish menus are what the consumers want, then why not sell them animated menus? Or, if consumers prefer spending their CPU cycles on getting actual work done instead, they can always buy software with a less fancy GUI. It's not like there aren't options available to suit every taste.

    • by 1 a bee ( 817783 )

      I'm not talking enterprise database software here, I'm talking about spreadsheets and word processors.. All of the gains we make in hardware are eaten up as fast or faster than they are produced by two main consumers: useless eye-candy for end users

      Oh c'mon.. It's not like I know how to put my fast processor to any better use.

  • I have designed a CPU that uses only one transistor, requires absolutely no power, and is infinitely fast! Of course at the moment the only instruction it can run is NOP, but I'm working on the problem...

    Garbage in, garbage out, professor. A computer that isn't accurate is no longer useful. We might as well go back to using thousands of humans to double-check other thousands of humans. Oh wait no those require FAR more energy and time.

  • by DavidR1991 ( 1047748 ) on Saturday May 29, 2010 @07:04PM (#32393210) Homepage

    ...that the Transmeta Crusoe processor has sod-all to do with porting or different programming models. The whole point of the Crusoe was that it could distil down various types of instruction (e.g. x86, even Java bytecode) to native instructions it understood. It could run 'anything' so to speak, given the right abstraction layer in between

    Its lack of success was nothing to do with programming - just that no one needed a processor that could these things. The demand wasn't there

    • Re: (Score:2, Interesting)

      The whole point of the Crusoe was that it could distil down various types of instruction (e.g. x86, even Java bytecode) to native instructions it understood. It could run 'anything' so to speak, given the right abstraction layer in between

      Yea, uh, that's true for *any* general purpose processor. What Crusoe original promised was that this dynamically recompiled code might be either faster (by reordering and optimizing many instructions to fit Crusoe's Very Large Instruction Word design--not unlike how the

  • by gman003 ( 1693318 ) on Saturday May 29, 2010 @07:19PM (#32393324)

    I've seen this before, except for an application that made more sense: GPUs. A GPU instruction is almost never critical. Errors writing pixel values will just result in minor static, and GPUs are actually far closer to needing this sort of thing. The highest-end GPUs draw far more power than the highest-end CPUs, and heating problems are far more common.

    It may even improve results. If you lower the power by half for the least significant bit, and by a quarter for the next-least, you've cut power 3% for something invisible to humans. In fact, a slight variation in the rendering could make the end result look more like our flawed reality.

    A GPU can mess up and not take down the whole computer. A CPU can. What happens when the error hits during a syscall? During bootup? While doing I/O?

    • by Kjella ( 173770 )

      But it's a long time since computers just draw something up on the screen. One little error in a video decoding will keep throwing off every frame until the next keyframe. One error in a shader computation can cause a huge difference in the output. What coordinates have an error tolerance after all is transformed and textured and tessellated and whatnot? An error in the Z-buffer will throw one object in front of another instead of behind it. The number of operations where you don't get massive screen corrup

    • Re: (Score:2, Interesting)

      by thegarbz ( 1787294 )
      I can't see this working. The premise here was that the hardware allows faults, yet I don't see how you could design hardware like this to be accurate on demand. GPUs aren't only used for games these days. Would an error still be tolerated while running Folding@Home?
    • Another good application would be for PMPs and other mobile devices. Who cares if you have one pixel decoded improperly? Odds are you won't notice on that tiny screen and you'd happily trade that for doubling your battery life. Power consumption is, at best, a tertiary concern on a desktop or server.
  • Bring it on down to the actual transistor level and compare it to the brain - we use x more neurons for a given job than a human might use transistors for a similar function.

    The brain expects neurons to misfire and goes on averages of clusters. This allows neurons to be kept on more of a hair trigger, which makes them less energetically expensive to change state. The same can theoretically be done with transistors - we use fairly high voltage (I'm not an EE, feel free to correct me here) differences to re

  • Simply put, we already use this. Network transport may have errors, and these are dealt with at higher levels. As long as a corruption can be detected, we are ok. But, if a computation results in an error, and the checking of it may also result in an error, we have a problem. Some part must be guaranteed. But the transmission can be handled the same way that networks are handled.

    If the store is not reliable, we can use RAID 5 or the like. This can even be done with main memory. But, we can't easily segregat

  • Faster performance is a luxury we don't really need. We only have applications demanding higher performance because it's available. There might possibly be a niche role for this in dedicated hardware for exploring certain computational problems, such as protein folding or monte carlo simulations or whatever, but not in our home computers.

  • by trims ( 10010 ) on Sunday May 30, 2010 @12:26AM (#32394752) Homepage

    I see lots of people down on the theory - even though the original proposal was for highly-error forgiving applications - because somehow it means we can't trust the computations from the CPU anymore.

    People - realize that you can't trust them NOW.

    As someone who's spent way too much time in the ZFS community talking about errors, their sources and how to compensate, let me enlighten you:

    modern computers are full of uncorrected errors

    By that, I mean that there is a decided tradeoff between hardware support for error correction (in all the myriad places in a computer, not just RAM) and cost, and the decision has come down on the side of screw them, they don't need to worry about errors, at least for desktops. Even for better quality servers and workstations, there are still a large number of places where the hardware simply doesn't check for errors. And, in many cases, the hardware alone is unable to check for errors and data corruption.

    So, to think that your wonderful computer today is some sort of accurate calculating machine is completely wrong! Bit rot and bit flipping happens very frequently for a simple reason: error rates per billion operations (or transmissions, or whatever) have essentially stayed the same for the past 30 years, while every other component (and bus design, etc.) is pretty much following Moore's Law. The ZFS community is particularly worried about disks, where the hard error rates are now within two orders of magnitude of the disk's capacity (e.g. for a 1TB disk, you will have a hard error for every 100TB or so of data read/written). But, there's problems in on-die CPU caches, bus line transmissions, SAS and FC cable noise, DRAM failures, and a whole host of other places.

    Bottom line here: the more research we can do into figuring out how to cope with the increasing frequency of errors in our hardware, the better. I'm not sure that we're going to be able to force a re-write of applications, but certainly, this kind of research and possible solutions can be taken care of by the OS itself.

    Frankly, I liken the situation to that of using IEEE floating point to calculate bank balances: it looks nice and a naive person would think it's a good solution, but, let me tell you, you come up wrong by a penny more often that you would think. Much more often.

    -Erik

  • RAM, Disk drives, CD-ROMs or modems are all designed to allow significant possibility of errors and employ redundancy to minimize impact on the end user. Why would anyone think CPUs would be exempt from similar design needs? Most demanding calculations take much less time to verify than to perform. If you can factor large numbers or compress files twice faster but with 5% error rate, wouldn't you spring up for an error-free coprocessor or slower error-correcting verification code as a trade off? No softwar

  • I can see this working for a graphics chip -- after all, who cares if a tiny portion of an image is a pixel or two off? For execution of an actual application, however, I think this idea sucks. There are far better ways to reduce power consumption, like asynchronous- [wikipedia.org] or reversible computing [wikipedia.org] techniques.
  • What kind of name is that anyhow? Kumar? What is that five o's or two u's?

"I am, therefore I am." -- Akira

Working...