Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Intel iMac Hardware

34 Design Flaws in 20 Days of Intel Core Duo 356

Pray_4_Mojo writes "Geek.com is reporting that Intel's errata (bug) documentation shows that the Intel Core Duo chip has 34 known issues found in the 20 days since the launch of the iMac Core Duo. (you can read the list) with only plans to fix one of them. While bugs in hardware is nothing new (the P4 has 64 known issues, at this time Intel does not plan to fix a single one) this marks one of the first times that Intel released a processor with known bugs, and some of the bugs are of higher severity than in the past. Also alarming is the rate the flaws have been found, at one and half per day since the launch of the iMac Core Duo."
This discussion has been archived. No new comments can be posted.

34 Design Flaws in 20 Days of Intel Core Duo

Comments Filter:
  • by Transeau ( 869731 ) * on Tuesday January 24, 2006 @12:31PM (#14548964)
    You do realize that there is an 85 page PDF of errors in the AMD64, right?
  • AMD errata (Score:5, Informative)

    by Anonymous Coward on Tuesday January 24, 2006 @12:33PM (#14548992)
    Revision Guide for AMD AthlonTM 64 and AMD OpteronTM Processors [amd.com]. Just for balance. (only two of them are really interesting, #113 is one of them IIRC)
  • by Ninja Programmer ( 145252 ) on Tuesday January 24, 2006 @12:34PM (#14549011) Homepage
    ... While bugs in hardware is nothing new (the P4 has 64 known issues, at this time Intel does not plan to fix a single one) this marks one of the first times that Intel released a processor with known bugs, ...


    Huh? That's clearly wrong. When Intel had its famous FDIV bug, they shipped it knowing that the problem was there (the chips were already manufactured before they noticed it in their internal design validation.) In fact I would highly doubt that any Intel chip (or AMD chip) has shipped without some known bugs in them.

    Its just a question of severity. Most of these bugs tend to be highly marginal in a "real software doesn't push that hard on the CPU" sense.
  • by toupsie ( 88295 ) on Tuesday January 24, 2006 @12:35PM (#14549020) Homepage
    Apple is not the only manufacturer using the Core Duo [notebookreview.com] chip [google.com].
  • by tlhIngan ( 30335 ) <[ten.frow] [ta] [todhsals]> on Tuesday January 24, 2006 @12:36PM (#14549035)
    It's called "errata", and it's common for most processors to be released with pages and pages and pages of errata.

    Of course, what happens is that the alpha/beta silicon ships to select customers without many errata (though internal testing often finds them too, and they ship with those). Then the manufacturer goes back, resolves a few, then the cycle repeats until everyone is happy with the bugs and it's released with a book of errata on them, and workarounds for the severe ones.

    "No fix" errata are common. The most serious of those have workarounds. Fixed errata are for things where there can be no possible software workaround. But there's a large number of varying severity - from cache incoherences, lock failures (you try to lock something, and it either can't be unlocked the usual way, or it doesn't reliably indicate lock), to bus and spec violations.

    Nothing new here...
  • Image Mirror (Score:3, Informative)

    by XMilkProject ( 935232 ) on Tuesday January 24, 2006 @12:41PM (#14549089) Homepage
    It's going pretty slow, here's a mirror I setup to the image with list on it: http://www.xmilk.com/coreduo.gif [xmilk.com]
  • by shawnce ( 146129 ) on Tuesday January 24, 2006 @12:44PM (#14549127) Homepage
    Not sure I understand the point of this new article... all chips have errata. This is like reporting that the sun set again or that slashdotters have no love life.

    For eample...

    The MPC7410 family of chips (aka G4) from Freescale (formally part of Motorola) has 21 errata currently listed: MPC7410CE.pdf [freescale.com]

    The MPC7447 family of chips (aka G4) from Freescale has 36 errata currently listed: MPC7457CE.pdf [freescale.com]

    The PPC 970FX (aka G5) from IBM has 24 errata currently listed: 970fx_errata_dd3.x_v1.6.pdf [ibm.com]

  • AMD Opteron errata (Score:3, Informative)

    by mrm677 ( 456727 ) on Tuesday January 24, 2006 @12:47PM (#14549151)
    The errata for the AMD Opteron is 85 pages long [amd.com]. I once spoke with a chipset designer and he told me that the Opteron errata was especially long with some convoluted workarounds, compared to other CPUs he's worked with.

  • Re:20 days? (Score:5, Informative)

    by Anonymous Coward on Tuesday January 24, 2006 @12:47PM (#14549152)
    And AMD has no bugs in their chips? Here's the Athlon 64 Revision History document off of AMD's own website:

    http://www.amd.com/us-en/assets/content_type/white _papers_and_tech_docs/25759.pdf [amd.com]

    There's a lot more listed there than for the Core Duo so far, and quite a few marked as "Won't be Fixed" and are scary sounding. Here's an example of a rather nasty looking ordering bug that results in system hang:

    Downstream non-posted requests to devices that are dependent on the completion of an upstream
    non-posted request can cause a deadlock in the presence of transactions resulting in bus locks, as shown in the following two scenarios:

    1. A downstream non-posted read to the LPC bus occurs while an LPC bus DMA is in progress. The legacy LPC DMA blocks downstream traffic until it completes its upstream reads.

    2. A downstream non-posted read is sent to a device that must first send an upstream non-posted read before it can complete the downstream read.

    In both cases, a locked transaction causes the upstream channel to be blocked, causing the deadlock condition.

    Potential Effect on System
    The system fails due to a bus deadlock.
  • by Theovon ( 109752 ) on Tuesday January 24, 2006 @12:48PM (#14549160)
    As an ASIC designer, I have produced my fair share of silicon bugs. Chips are expensive to produce, making bugs expensive to fix. As a result, chip designers (even ones with deep pockets like Intel) do not look at bugs as something to FIX, but rather as something to MASK. I don't mean to hide it from people (although that does happen), but to make it not a bug by working around it.

    Unless the bug is so fatal that you can't work around it, or the bug could potentially cost lives, the primary solution is to work around it. Either you write driver code to avoid the bug, or you find some other cheap solution. Sometimes, it's a simple matter of removing a feature from your marketing literature.

    Intel's typical means to mask processor bugs is microcode. This hurts performance, but they can typically create a workaround that routes everything around the bug. I can't read the article (it's slashdotted), but I'm sure that by saying they won't fix some bugs, they're saying that they won't respin the silicon but rather mask the bug in some other way.

    Listing the bugs (and not fixing them in this version) is an appropriate thing for Intel to do.

    (I'm no Intel fanboy. I think they're bastards. But this is NOT an example of them being bastards.)
  • by flyinwhitey ( 928430 ) on Tuesday January 24, 2006 @12:49PM (#14549176)
    http://www.amd.com/us-en/assets/content_type/white _papers_and_tech_docs/25759.pdf [amd.com] And as an aside, it took two seconds (actually .08) seconds to look up on Google. Maybe try that next time.
  • by freidog ( 706941 ) on Tuesday January 24, 2006 @12:50PM (#14549193)
    Here you go [amd.com]

    I didn't bother to actually count the number of unfixed or no fix planned glitches / bugs in there, so I don't know if it actually validates the 80+ the grandparent claimed, but there are quite a few known bugs in A64 and its HTT bus.

    In fact there are going to be any CPU released, even stuff like Power / Itanium / USpark are going to have errata like this. Microprocessors are inredibly complex equipment, and 100% stable and glitch free under all possible conditions just isn't going to happen. Who ever submitted this story is blowing this entirely out of proportion. The link is already Slashdotted so I haven't gotten a chance to read what the bugs / glitches are, but I would be good money a normal user could go through the entire life of their Core Dou Mac and never notice one. These are typically very small gliches / bugs that occur under very specific conditions, and are meant more for hardware manufacturers to be aware of than they are to warn a user there could be problems with their chips.

    publishing them publicly I think is a good move on Intel's part, but they do run this risk where people don't understand that this is a completely and utterly ordinary and expected thing to happen.

  • Re:No buy (Score:3, Informative)

    by manno ( 848709 ) on Tuesday January 24, 2006 @12:53PM (#14549223)
    And you think that the A64, and P4 are clean and squaeky?
  • by gordo3000 ( 785698 ) on Tuesday January 24, 2006 @12:57PM (#14549265)
    to clarify: at least your first link was to one with problems concerning the entire line of athlon processors. a lot of those problems are specific to one of 10 different processors that paper covers. I would bet that tehre aren't that many in any given athlon.

  • Re:Oh thats it! (Score:2, Informative)

    by HeroreV ( 869368 ) on Tuesday January 24, 2006 @12:58PM (#14549266) Homepage
    7. Cute rainbow-colored apple now inhabited by cute rainbow-colored worm.

    I like #7 and #11 myself :-)

    Apple hasn't used that rainbow-colored apple logo in ages, have they?
  • Re:No buy (Score:5, Informative)

    by mr100percent ( 57156 ) * on Tuesday January 24, 2006 @01:00PM (#14549285) Homepage Journal
    All chips have errata, and custmarily are well documented and are published on the vendor's web site. BTW, errata can be something as simple as a correction to the datasheet. Most are usually minor and are dealt with by the compiler. For example, if there's an error with calculations dealing with a certain registry and decimal values, the compiler would just not use that registry for the calculations.

    The documented and known errata are not what you should be concerned with. It's the unknown ones that freeze your computer or cause all robots to attack their masters.

    If someone's complaining about this, they should just turn off their computers, because as we ALL know, every operating system (the OS is what runs on chips that have the errata) also are shipped with hundreds, if not thousands, of known bugs. You're not going to find a perfect chip in the real world. How many errata did the G4/G5 have? By comparison the IBM PowerPC 970FX has 24 errata, none of which is planned for a fix. When you consider the 970FX is a fairly mature chip, 34 errata on a new chip is hardly news worthy. As transistors get more and more compact and miniaturized, I'm sure we're bound to see more.
  • by homer_ca ( 144738 ) on Tuesday January 24, 2006 @01:05PM (#14549342)
    "Intel's typical means to mask processor bugs is microcode."

    That's true. Every Intel CPU since the Pentium Pro can update its microcode. Many times, BIOS will contain microcode updates from Intel. Linux also has a microcode update driver [urbanmyth.org].

    "I'm sure that by saying they won't fix some bugs, they're saying that they won't respin the silicon but rather mask the bug in some other way."

    I'm not sure about that. "Will fix" seems to imply the errata could be fixed in silicon or microcode, while "Will not fix" means it won't get fixed at all.

  • by Anonymous Coward on Tuesday January 24, 2006 @01:06PM (#14549353)
    I think that G5 processors are from the Power970 line (wikipedia tells me that PowerPC 970, PowerPC 970FX and PowerPC 970 MP are all G5), so here's the page for the 970 and 970FX [ibm.com] and Errata Notice version 1.6 for design revision levels DD3.0 and DD3.1 [ibm.com] which shows 24 errors, all of them marked as WONTFIX.
  • by masklinn ( 823351 ) <slashdot.org@mCO ... t minus language> on Tuesday January 24, 2006 @01:12PM (#14549397)

    I'm not sure about that. "Will fix" seems to imply the errata could be fixed in silicon or microcode, while "Will not fix" means it won't get fixed at all.

    A workaround isn't considered as a FIX, WONTFIX is wontfix even with published workarounds (including microcode). WONTFIX means that the error won't be fixed at the silicon level, which is the subject of errata papers.

  • Re:Faster (Score:5, Informative)

    by VitaminB52 ( 550802 ) on Tuesday January 24, 2006 @01:38PM (#14549693) Journal
    It seems likely that given the increasing complexity, the error rate is going to rise proportionally. I mean, how many errors do you expect in a 100,000 transistor chip vs a 100,000,000 transistor chip?

    Given the fact that a very substantial part of the extra chip estate is being used as L1 and L2 chache, the error rate should increase less than proportionally. If you upgrade cache size from say 8 kB to 1 MB, then there is only a relative small increase in complexity of the cache controler, not of the cache itself.
    Add the new chip design software and the use of hardware libraries for standard chip functionality, then the error rate should increase even slower.

  • by wild_berry ( 448019 ) on Tuesday January 24, 2006 @01:43PM (#14549743) Journal
    Your comment is misleading. The document lists only 61 errata and contains their respective details. The initial table of errata -- table 5 -- is only four pages long (begins 13 and ends 16) and is most likely to group the problems by the wafer families; the next two pages reiterate the errata for each given brand name of AMD K7/K8 chip; all but one of the remaining pages detail the errata and their suggested workarounds/fixes. The last page is a list of extra resources.

    I don't dispute your comment regarding the experience of a chipset designer.
  • by stevesliva ( 648202 ) on Tuesday January 24, 2006 @02:00PM (#14549878) Journal
    Chip bugs often are due to the intersection of the domains that the "chip simulations" you mention. You get static timing analysis, power analysis, logic verification, transient simulation at various process and applied conditions. But many of the analyses are done without true interlock with the other simulators. And you get layered levels of abstraction, and all sort of automated tools hooking all the abstracted components together...

    So if you look at the list of errata, you see things like flags not getting set properly after the execution of an instruction. What could cause this? 1.) The design was logically incorrect. 2.) The design was logically correct, but the flag is never properly latched on the correct cycle for all hardware. 3.) The flag doesnt get set for slow hardware. 4.) The flag doesn't get set for hardware that has issues with supply integrity. Etc etc.

    One would think that if they screwed up the implementation of a long-lived feature, it wasn't a logic error (likely to be caught by running verification) but an error caused by the analog or physical world intruding upon the digital domain. Some small amount of this may be expected-- oh crap! 1% of chips have an obscure timing issue we can't catch in test-- but if it is a true logic bug, someone screwed up.

  • by SharpFang ( 651121 ) on Tuesday January 24, 2006 @03:08PM (#14550494) Homepage Journal
    Therefore its expected that a chip fabricated on a substrate whose minimum feature sizes are half those of the other chip and whose complexity is double the other chip would have 4x the errata items of the other chip.

    Complexity of the CPU contributes some to the amount of bugs - more project work = more bugs, though only in cases of introducing new algorithms, not in case of adding "more of the same" - dual core CPU is NOT supposed to have twice as many bugs as single-core counterpart, because the two cores are identical, contain the same flaws as the single core, and new ones are introduced only by the extra glue logic that makes it "dual". Twice the complexity usually means twice the number of gates, not twice the difficulty of design - stuff like cache memory swallows a major part of available space but 64KB of cache is associated with the same number of bugs as 4MB of it. So not x2 by complexity. At most x1.5 or so.

    And thet errors are not manufacturing flaws, they are design flaws / software (VHDL) bugs. If I write a program twice as long as original and save it to a harddrive of double the capacity, am I expected to have four times as many bugs? The new technology has its own share of problems but they are to be caught before releasing the chip from the factory, and chip that has a technology-related fault is just faulty and should be replaced. It has nothing to do with what appears in errata.

    So - the new CPU can have more bugs than the old one. But not four times as many!
  • Re:Up front (Score:4, Informative)

    by Radicode ( 898701 ) on Tuesday January 24, 2006 @03:34PM (#14550726)
    And I would add that most "flaws" can be avoided by the compiler. Programmers (except the ones making the compiler) don't have to worry about those. These bugs occur in really rare conditions that can be avoided. CPU design is really complex... if you thought assembler instructions were executing one after the other, you're wrong. Usually, they will execute in mixed order, many at the same time. That's what makes a fast CPU.

    For those still reading books, I suggest "Computer Architecture" by John L. Hennessy and David A. Patterson.

    Radicode
  • Re:Up front (Score:2, Informative)

    by Massacrifice ( 249974 ) on Tuesday January 24, 2006 @04:08PM (#14550980)
    I thought the P4 was slower than the P3 when it started because of its lower IPC.
  • Re:Up front (Score:2, Informative)

    by Analog Squirrel ( 547794 ) on Tuesday January 24, 2006 @05:04PM (#14551520) Homepage
    Or possibly the one by David A Patterson and John L Hennessy...
  • Re:Thank you (Score:2, Informative)

    by Twanfox ( 185252 ) on Tuesday January 24, 2006 @07:21PM (#14552846)
    I think you have that last sentance backwards, or at least, incorrect. AMD chips run at a slower clock speed, but do more per clock cycle than the Intel chips do. While Intel chips are pushing 3GHz and faster, AMD chips are not nearly as fast, and yet remain competitive in terms of 'work done'
  • by ciroknight ( 601098 ) on Tuesday January 24, 2006 @08:58PM (#14553397)
    Well then your point is flawed, because as any manufacturer of CPUs will tell you, error will crop up after they are taped out and produced. AMD certainly is no stranger to it, neither is Freescale or IBM. Hell, there are smaller processors used in cellphones and calculators that have errors much worse than anything Intel's ever released, and yet you never hear about those. Why? Because these kinds of errors are trivial to fix in Software.

    Secondly, no, these chips are probably revision 8 or 9 internally; they'll typically do a few runs at a time to make sure that yields are where they want them to be, and that mechanically the chip checks out. However, you can not do intrinsic debugging at this level, because of the simple supply problem; there are not enough chips made at this point to get all of your engineers looking at them. This is why most manufacturers won't catch an error until the first production run is underway, and by then it's far too late to go back to your design drafts, fix a bug, and re-tape the processor. It'd delay the product by 4-6 months; you've got to remake all of your lithograph templates and make sure they're all exactly created to spec, you've got to re-send out all of these plates to all of your fabs, you've got to then go through recert and make sure that the chips work (yes, that means you have to make more wafers of bad chips), and then you're still looking at debug time.

    And for what? Your processor's accidentially got a single instruction that's lightly flawed which can be checked and fixed in software (if (value == (INTEL_DEBUG_VALUE && expected_value)) { intel_fix(); } ).

    Lastly, if you need an example of any product shipping flawed, take a look over at the car industry. There are recalls, after recalls, after recalls on parts that are often bad, and require a new bolt to fix something. Think of this as the same thing, only you don't have to take your car into the garage; you are likely to never know, speak with, or hear of the people who are fixing the problems mentioned in this article. These are problems for OS developers, who are working in debug mode, who *might* run into this problem if and only if some crazy absurd bit-pattern is laid out just right in a register when a command is executed (for example).

    So please, before you tell a Computer Engineer how to make a microprocessor, make sure you know what you're talking about. It's better that they catch these problems in the weeks after release so that the OS developers will have time to fix them before their next major version goes out and they actually have to release a patch to deal with it. It's better that they catch them before they run the next production run, just in case there is an error that warrants fixing (and they've only discovered ONE of such errors, and they are probably going to wait until Core Duo rev B to do it). And it's better that they catch them at all, instead of a year down the line when everyone starts to realize their floating point math is going screwy on their multimillion dollar simulations.
  • by laird ( 2705 ) <lairdp@@@gmail...com> on Wednesday January 25, 2006 @12:09AM (#14554410) Journal
    "this marks one of the first times that Intel released a processor with known bugs"

    Every chip Intel has ever shipped has had errata. This isn't unique to Intel, of course -- every chip ever shipped has had errata. The only news here is that apparently people have found a lot of bugs in this specific chip fairly quickly. But Mac users are a demanding bunch...

    http://www.amd.com/epd/desiging/tsdocs/2.erratashe /index.html [amd.com] lists AMD's errata sheets.
    http://www.rcollins.org/Errata/ErrataSeries.html [rcollins.org] documents some Intel errata from the late 90's.
    http://mysearch.intel.com/corporate/default.aspx?c ulture=en-US&q=errata&searchsubmit.x=12&searchsubm it.y=8 [intel.com] searching for Errata on Intel's site returns 6,520 hits (most for errors in documentation). This is to their credit -- everyone makes mistakes, and documenting them benefits everyone.
    http://www.freescale.com/webapp/search/MainSERP.js p?QueryText=errata&RELEVANCE=false&showAllCategori es=false&srch=1&assetLocked=false&pageSize=5&Selec tedAsset=Product+Pages& [freescale.com] and FreeScale has a ton of errata documentation as well.

    You get the idea.

The only possible interpretation of any research whatever in the `social sciences' is: some do, some don't. -- Ernest Rutherford

Working...