Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
HP Hardware Technology

HP Introduces Defect-Tolerant Nano Elements 93

versicherung writes "With the ever shrinking feature size in microelectronics it will soon be prohibitively expensive to manufacture defect-free nano elements. HP has come up with a new way to produce fault-tolerant microchips. Utilizing mathematical techniques borrowed from coding theory, HP will be able to produce those chips by using a cross-bar architecture and adding 50 percent more wires as an 'insurance policy,' to fabricate nano-electronic circuits with nearly perfect yields even though the probability of broken components will be high."
This discussion has been archived. No new comments can be posted.

HP Introduces Defect-Tolerant Nano Elements

Comments Filter:
  • Bugs (Score:2, Interesting)

    by panxerox ( 575545 ) *
    Does that mean that the phrase "thats not a bug thats a feature" will now be an accepted marketing term? Untill true nanofabrication becomes available this will become the standard thruout the industry. Now the question, is there a copyright on fault tolerent circuts? Prior art anyone?
  • To me, this sounds like quantity over quanlity, in order to get these things to work. Hey, whatever works I guess...
    • That's been standard, tho. The failure rate on LCD manufacture, in particular, was, and I think still is, very high. Like 70%.
    • With the ever shrinking feature size in microelectronics it will soon be prohibitively expensive to manufacture defect-free nano elements.

      My question is, why does size matter? I mean the bigger you make these things the more places there are for defects to occur right? Shouldn't this work the other way around?
      • Do a bit of research into coding theory - What they're speaking of here is called coding gain.

        The idea of coding theory (usually applied to telecommunications, and I'll speak of it as it applies to communicating bits of data here because that's the aspect of it I'm most familiar with) is to introduce predictable redundancy into the data you transmit so that if some of it gets corrupted, you can recover the original message without error.

        An example of this is the (23,12) Golay code. For every 12 bits of i
        • Not only that, but you can design the code to be tolerant to specific kinds of errors, the errors common in the system you are trying to protect.

          For example the CRC error detection scheme, while not cryptographically secure, it is tolerant, for example, to errors which are long streams of 0s or 1s. This is a common error on communication lines.
          Another example is error correction on DVDs. The data is coded in a way that a scratch will be detected for example by keeping the parity bits physically far from th
    • Actually its both, with more quantity, comes more fault tolerance, and with that comes better quality. When modern silicon has half a billion transistors you have to prepare for some of them to not work if you ever want to have a usable product.
    • Of the X% of your brain that's "unused", probably a lot of it is backup circuitry dedicated for recoveries after drinking alcohol.

  • Now if only they still made chips, like the Alpha or PA-RISC, it might matter, but since both architectures are toast, why are they even researching this?
    • by Anonymous Coward
      Not all chips are CPU's. Besides for a CPU you don't want to waste a lot of transistors on redundancy. This is obviously made for minor chips that don't produce a shitload of heat and don't need to go any faster.
      • I'm still curious as to what chips HP manufactures these days. I'm guessing very few if any. With the cost-cutting they've done cheaping out on their employees across the board, the fact that they would still have people researching IC manufacturing is a bit baffling.
        • Yep, many of HP's actions are quite baffling. With Fiorina in charge, they gave up on PA-RISC, spun off their test & measurement division into Agilent, and basically became a printer and white-box maker.

          If this is the path they've chosen, it seems like they should get rid of all these researchers (maybe Intel might want them), and just concentrate on making printers and PCs. Of course, their PC division doens't seem to be doing so well against Dell, so maybe they just should dump that too and just ma
    • Maybe because they can use it for other chips than CPUs, maybe they can sell the technology to others, maybe they have other plans...
    • Even if they totally went out of the CPU biz, getting royalties from Intel, AMD, IBM, etc, is still income.
  • It's True! (Score:4, Funny)

    by Anonymous Coward on Saturday June 11, 2005 @07:49PM (#12791608)
    "HP has come up with a new way to produce fault-tolerant microchips."


    When they do fail, HP will claim it's not their fault and we'll have to tolerate it.

  • Wouldn't the cost be the same. Say 50% more wires= 50 % less errors, but you still spend 50% more on the wires there, so you would still break even because even though your yeild is 150% of the original, the chips will also be 50% more costly because there's 50% more wires in it.
    • I suppose a wire is easier to make than a transistor (less steps etc., I don't know the exact processes). Plus, mothing says that 50% more wires = 50% less errors, it could be 90% less errors, for example.
    • by Anonymous Coward
      Your assumption is that wire is the only cost in making a chip. A microchip requires lots of steps in the process and that's the majority of the manufacturing cost Wires are just additional metal layers, so long if you can get away adding a couple of layers, the additional cost is minimal.

      What I would worry about is more on the chip perfromance side of things - namely the additonal capacitance loading, cross talk and the overall routing density for this approach.
      • "What I would worry about is more on the chip perfromance side of things - namely the additonal capacitance loading, cross talk and the overall routing density for this approach."

        Also the power consumption. I don't have a breakdown of where this technology would be used and where chips spend the most power (apart from knowing cache doesn't take much power), but it might hurt on laptops.
    • At the moment, if there is even one error in the chip it has to be biffed.

      Coding theory allows much better returns than 50% less errors with 50% more wires.

      For instance the Hamming Codes will correct one error in a word 2 to the power of R bits long for the cost of R bits.

      So if your chip processes 32-bit words, you could instead process 32-5=27 bit words, and if one of your 32 little gatey-things didn't dope correctly, you would still get the right answer.

      For the chip to need to be chucked you would

    • It's a game of probability, where the designer tries to minimize the damage from random dust particles.

      VLSI chips spend about 30% of their real-estate on the clock and power wires. So, a single particle of dust acts like a meteorite knocking out a whole suburb of a city. The damage caused by a broken power or clock wire is far more substantial as it can knock out other areas not immediately covered by the unwanted object.

      If you have redundancy (like texture pipelines on a GPU), you can increase your yield
    • How would 50% more wires mean 50% less errors? If each "unit" has a 0.001% failure rate, then adding a redundant "unit" would give an average failure rate of 0.000001, which is 1000 times lower, not twice as low.
  • by doormat ( 63648 ) on Saturday June 11, 2005 @07:54PM (#12791633) Homepage Journal
    Once the defect rate is low, the extra 50% more wires will just take up unnecessary space and increase production costs. But for now, it seems completely acceptable to up the production costs and size in order to get yields higher.

    This kind of concept is already in use throughout the rest of the microprocessor world - Intel (maybe AMD too, I dunno) has extra cache lines in their microchips, and they deactivate defective cache lines, and reroute them to the "spare" lines to improve yield.
    • If it really works, then it's not just useful "as long as defect rate is high". Think about it, current technology assumes that everything needs to be perfect, but if you can tolerate some defects, then you can be a lot more aggressive in the design. That means using a smaller features, lower voltage, higher clock rate, ...
    • A really good example of this is how the Cell cpu in the Playstation 3 is only gonna have 7 SPUs instead of eight. They just deactivate witchever spu is broken and TADA, a good chip! And then when yields improve enough to make this unneeded, they will just make it with 7.
  • Comment removed (Score:4, Interesting)

    by account_deleted ( 4530225 ) on Saturday June 11, 2005 @07:54PM (#12791637)
    Comment removed based on user account deletion
    • Re:Brute force! (Score:3, Insightful)

      by Tomfrh ( 719891 )
      If you want to know whether this sort of design is acceptable, ask whether CDs and DVDs are acceptable. They are founded in coding theory, and are designed to have many bad bits, and yet still contain perfect information.
    • Re:Brute force! (Score:1, Insightful)

      by Anonymous Coward
      Well, do you think internet protocol is inelegant? If you have to send information through a noisy channel where one in every 10^3 bits gets flipped, you need some mechanism for error correction or you simply can 't communicate.

      Now we're not talking about communication channels here, but the analogy is the same. There are some factors we as engineers can't control (such as thermal noise, for example) and so we have to work around them. I won't get into the technological details of nano-fabrication right
  • by SatanMat ( 757225 ) <PowellS@gmail.com> on Saturday June 11, 2005 @07:58PM (#12791654) Journal
    Capt'n I'm rerouting the .... Wait never mind it did it all by itself....


    Okay all fixed. I guess you don't need me anymore, I'll just go and get drunk in the corner.
  • I hope they can apply this tech to LCD displays, which are like giant-area microchips. The yield on LCD batches is low (only maxing at 60% [utwente.nl]), because defects come per cm^2, (mostly) regardless of transistor featuresize. One (or a few, depending on the QA of the manufacturer) defect can spoil a whole unit; more area means more chances of spoilage. If HP's redundancy means a pixel has two chances to survive defects, the yield might multiply greatly, as the odds of two defects in a single pixel's area is very
    • lcd problems often deal with a defect in the glass/crystal because of that its not simply an issue of putting more wires, only 1 pixel unit at 1 depth, if they put another behind the one in front would obstruct it... otherwise we would have very common multilayer LCD screens
      • Do you have any breakdown on kinds of LCD defects? My info indicates [ercservice.com] that circuit defects, just like the ones HP is addressing with redundancy, are the majority of the problem. The redundancy HP is delivering isn't (necessarily) at the pixel (or subpixel channel) scale, so we're not talking about two redundant layers, only one of which could be seen. We're talking about the transistor interconnects, which don't block visually. Moreover, even if the actual redundant features needed alignment all in a single
        • Interesting link there, I didn't realize that the problems with pixel defects were at the transistor level... I remember reading a few articles discussing the costs of fabricating lcds and how minute defects in the display level were exceedingly difficult to prevent significantly, and that even a tiny particle could cause a dead pixel. If it really is all circuit defects, then there really is no reason (other than raw space) that they couldn't solve that with redundant cabling/sturdier wiring in the first p
          • I was glad to find that page, in support of my point, myself - an excuse to learn a lot more details about the tech. The reason LCD manufacturing has had these inefficiencies comes from their approach to the economics of evolving manufacturing techniques. IC fabs in general have gone to smaller process sizes for yield (more dies per area with the same defects per area means more working dies), speed, and (lower) power demand. None of those benefits matter to (or materialize in) LCDs at their largish per-uni
  • There may be more to this story than HP is leading us to believe. First of all, why is HP releasing this information? Is it to give other chip makers a heads up of what HP is planning in the near future? I think not. This story sounds more like a cover-up rather than an explanation.
  • I wonder if you could bypass the adaptive technology and try to over clock a processor based on this.
  • by Aphrika ( 756248 ) on Saturday June 11, 2005 @08:01PM (#12791669)
    HP [hpfoods.com] technology has always been my number one choice for helping me tolerate defective chips...
  • by Anonymous Coward
    Children,

    Before there were computers, people sometimes checked the accuracy of their arithmetic by "casting out nines" (google for it). When computers were big things full of vacuum tubes that had the tendency to go out in the middle of a calculation, people used parity-checking to ensure the integrity of the calculations. Coding theory has come a long way since then , with new schemes for different applications, such as crypto and telecom (from TFA). The principle is old, but I'm sure these guys had to co
  • by G4from128k ( 686170 ) on Saturday June 11, 2005 @08:48PM (#12791862)
    The increasing use of spare circuits could let product makers offer variable-performance, gracefully-degrading products. As the product degrades it would map out the bad circuits, but keep functioning. An overclocked GPU might be specced to have 16 vertex shaders, come from the factory with 18 working and then slowly lose them over time (but not drop below 16 during the warranty period). Used long enough, it might steadily lose vertex shaders until it can no longer function.

    For example, I wish my ATA hard drives would let me access all of the space on the drive, including spare blocks tagged for remapping of bad blocks. A flexi-capacity drive would show higher-than spec capacity on first install and then gradually degrade. Standard practice of never using 100% of a available space would guarantee the availability of at least a few spare blocks. Current drive logic fails the drive once the spare blocks are used up, but a smarter drive would keep working by steadily shrinking the drive capacity. The OS might show this as a steadily-growing, locked "BAD_BLOCK" file. A well-used hard disk might last much longer, but shrink below rated capacity and still function adequately.

    A dynamic version of this technology would be a real boon to over-clockers. Say you buy a heavily multi-cored CPU (guaranteed to have at least 32 of 40 fabricated cores functioning). It might come with 35 of the 40 fabricated cores working at design clock-speed. Over-clocking might knock out a few cores that were marginal but let the system's user optimize the speed of the cores vs. number of usable cores in realtime. A fully dynamic self-testing, self-healing system might automatically bring marginal cores back online once the clock-speed is dropped.

    I realize that companies currently sell the same chip with different ratings by testing for speed or usable components (e.g. usable vertex shaders in a GPU), but what I want is different. Rather than use spares to guarantee some fixed spec performance (the current industry practice of leaving only a fixed set of available good components active on a chip), users could enjoy both more initial performance and longer life from products using a dynamic self-testing, self-healing system that uses all know-good components. Such systems would gracefully degrade as vertex shaders, disk blocks, RAM cells, or cores die or stop functioning at high speeds and temperatures.
    • Silcon circuts don't need graceful degradation, as they do not degrade in any meaningful manner once they have left the factory.

      The only situation that results in greatly accelerated degradation is that of where you overvolt a chip beyond the specifications. Since it only happens when run out of spec, there is no need in the manufacturers eyes to create a gracefully degrading system.

      Hard drives, as well, have little use for their internal defect management then to look pretty to the user, as the magnetic
      • Silcon circuts don't need graceful degradation, as they do not degrade in any meaningful manner once they have left the factory.

        Silcon circuits most certainly do degrade over time, even in normal use. It just so happens that so far this has been "under control". But as technologists keep reducing the feature size, these effects will become much more important.

        Several people in my team work in exactly this area of micro-electronics research by the way: how to optimally compensate for these (and other re

    • You wouldn't want hard drives like that.

      The reason I say this, is because it would involve lot's of complex handling, probably both in the filesystem and the disk firmware. Either the disk knows about the filesystem, and you need rediculously complex protocols talking to the disk about what the fs really looks like (because stuff yet to be written is in the cache), or you need to handle "this file used to be in block 3532552, but the disk is now only 3532550 blocks large, so it must have moved to...".

      A

      • You also have a big problem if someone decides to completely fill the disk. Then what does the OS/drive do when it has to "shrink" the drive some more? You might say, "Don't fill it completely up then", but then I would argue that if you have to leave a few gigabytes open at all times for this drive to work - then why bother? I think I'll stick with my fixed capacity drive and rely on SMART to tell when my spare capacity is about to run out.
  • by Anonymous Coward
    some people are questioning whether this is worth it. remember that nano-circuits are experimental and techniques like this are necessary just to make working circuits.

    'crossbar architecture' is an experimental architecture that, using carbon nanotubes laid out in a grid with selectively chosen connections, allows you to perform useful functions, such as logic. HP recently (a few months ago) announced the crossbar latch [google.com], which they claim will eventually eliminate the need for transistors.

    Unfortunately t
  • Can we get some naked hippies on this dangerous development, stat?
  • by melted ( 227442 ) on Saturday June 11, 2005 @10:57PM (#12792552) Homepage
    What happens with components that are not "bad" but are "on the verge" and can go bad any minute? Something tells me that there will be a lot more of those in a chip where "bad" components are perfectly fine. They're impossible to detect, too, because during QA they'll work perfectly fine.

    I see this tech as a temporary crutch for something more advanced - self diagnosing and self-healing chips. Now that would be frikkin' cool.
  • When you hamstring your profitability by increasing your per-chip costs by altering the basis for the architecture and adding superfluous components, "yield" no longer has the significance it was invented to relate.

    By the time the learning curve decays, it could be cheaper just to throw away bad parts in the old technology than to modify new ones in the new one.

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...