Erratum Plagues Quad-Core Opterons, Phenoms 226
theraindog writes "Errata are not uncommon with new processors, but a problem with the TLB logic in AMD's quad-core Opteron and Phenom processors appears to be quite serious. The erratum is so severe that AMD has issued a 'stop ship' order on all quad-core Opterons. AMD has also blamed this bug for the delay of the 2.4GHz Phenom, despite the fact that the erratum is unrelated to clock speed. A BIOS-based workaround for the issue has been made available to motherboard makers, but it apparently carries a 10-20% performance penalty. What's more disturbing is that AMD knew of the erratum and the potential performance hit associated with fixing it before it launched the Phenom processor. Hardware provided to the press for reviews did not include the fix, conveniently overstating Phenom performance."
What??? (Score:5, Informative)
But dictionary.com is your friend.
Design errors and mistakes in a CPU's hardwired microcode may also be referred to as an erratum. One well publicised example is Intel's "flag" erratum in early Pentium Pro processors. This made the conversion of floating point numbers to integers unreliable due to an exception not being signaled under certain conditions.
Re:What??? (Score:4, Insightful)
The thing is, the CPU is actually broken a bit and AMD has pulled the Barcelona line but are continuing to sell the Phenom(inal Failure) line to customers and, evidently, don't plan to 'fix' the problem later (Intel offered replacements for the Pentium floating point bug after they got dinged on it, for example... I know... I had one and replaced it).
So... if you actually get your hands on (or got your hands on) a Phenom, realize you have a broken CPU and the more you load it, the more likely you'll have stability issues.... and AMD isn't (currently) going to fix it.
Re: (Score:2)
Well... I can't remember any for my beloved 6502. Since the specs were five pages long, I can't imagine it having a long list of design flaws. It didn't even have a long list of features
Re:What??? (Score:5, Informative)
They may not have been published, but there are at least three:
1) A memory-indirect jump where the address is stored across a 256-byte boundary will read the second byte of the address from the wrong location.
2) The arithmetic status flags are not valid when performing arithmetic in BCD mode.
3) If a hardware interrupt occurs while the processor is fetching a BRK instruction, the BRK instruction is ignored.
Re: (Score:2)
Re: (Score:2)
I guess I never ran across any of those problems (that, or my distant memories have already faded to pink). Were all 6502s like that or there were differences between MOS and Rockwell and the later 65c02 processors?
And, AC, yes. The 6502 is very RISC-ish. But, at that time, being RISC was not considered cool enough to be marketed as such, not to say I don't know if the acronym had already been invented.
I loved that page zero thing. Very, very fast. On
Re: (Score:2)
On the other hand, use Z80'ers had single instructions for block data copy (LDIR) and similar instructions for IO IN/OUT (OTIR/OTDR/INDR/INIR).
Re: (Score:3, Informative)
Re: (Score:2)
It's easier not to screw things up when you're building a chip on a silicon process the size of Duplo blocks.
It also should be noted (Score:2)
Bigger ones generally are fixed in microcode, and sometimes even lesser ones. Howeve
Re: (Score:2)
Re:What??? (Score:5, Funny)
Mod me down, call me troll, but please don't claim to be a geek if you can claim to never have heard of erratum or errata. That's as bad as not knowing what a bug is or calling a PC case and its contents a hard drive.
Here's a heartfelt suggestion...read more.
Re: (Score:2)
I'm afraid you're not. You've obviously never read the datasheet of a microprocessor or microcontroller. Turn in your geek license and try again.
Re:What??? (Score:5, Informative)
The conventional terms used for erratum, however, are usually "error" or "bug".
AMD making big mistakes under pressure! (Score:2, Interesting)
Anandtech [anandtech.com] I'm looking at you.
NDA for patch? (Score:5, Interesting)
Good thing it's just a patch, as opposed to a derived work of someone else's GPLed code. I wonder what the FSF guys would say about that. I also wonder: Red Hat, why?
Re: (Score:3, Insightful)
NDA not enforcible (Score:2)
Depends ... (Score:2, Insightful)
Re:NDA not enforcible (Score:5, Informative)
The GPL only applies to redistribution. Private-use changes don't have to be GPL'd.
IANAL,TIJHIUI (I Am Not A Lawyer, This Is Just How I Understand It).
Re: (Score:2)
Re: (Score:2)
But their change isn't under the GPL. If they were distributing a patched kernel, they'd have problems. But distributing the patch itself is okay.
Re: (Score:2)
So I can develop a proprietary mod of the kernel, diff my binary against the standard kernel binary, then distribute my proprietary mods as a binary patch to the kernel? Sounds like quite a loophole.
Re: (Score:2)
Re: (Score:2)
Now, a patched kernel WOULD be a derived work, and that would have to be GPL.
Anybody can apply a patch to anything without any licensing issues, but you might not be able to redistribute the resulting product.
If RH were distributing a patched kernel not under the GPL, that would definitely be a problem.
What? (Score:2)
Re: (Score:2)
There has been a possibly analogous situtations with DVD bowdlerization. In the ClearPlay system, you have a special DVD player plus filter files for each movie. When you play a DVD, it looks at the filter file and skips stuff as required. Is the filter file a derivative work? The studios tried to challenge ClearPlay legally, but the suit was interupted when the Federal government passed a law explicitly allowing thi
Re: (Score:2)
Re: (Score:2)
Re:NDA for patch? (Score:5, Insightful)
There are other possibilities that are more likely. For example, perhaps the patched kernel is doing something like loading microcode into the processor. The kernel code would be GPLed but the microcode would not be.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Unless your microcode contains an HCF op
-
Re: (Score:2)
This doesn't have to be so bad (Score:3, Funny)
"AMD Outside".
Re: (Score:2, Funny)
Re: (Score:2, Insightful)
AMD can turn this into a PR boon to one-up Intel at the "Green" initiatives. All they have to do is repurpose the uncut wafers of these chips as solar panels and then retile the outside of all their buildings with the panels. This will save money on their energy bills and they can even start a new Ad Campaign:
It will not stop me from buying AMD. The only processor I have ever (of 20+) had that cooked was a P4 2.4GHz HT on a Intel PERL mobo no less! But I have abused two older AMD chips I still have runni
Re: (Score:2)
Bummer (Score:4, Insightful)
Re:Bummer (Score:5, Funny)
Re: (Score:2)
Naw, keep using it.. it's Ok for web browsing.
The Windows EULA will keep you safe if you follow all the requirements.
Read it to understand the statement.
Re: (Score:3, Funny)
Well, AMD doesn't sell used processors, as far as I'm aware, so where else would AMD have problems than in brand new processors? I mean, seriously, if a bug was found today in 1 GHz Durons that required a slowdown to work around, the headline wouldn't be "
Re: (Score:3, Interesting)
Wow, bad times for AMD. They're losing the war against intel, and now have another set back. A 20% performance penalty is simply unacceptable for any processor. The fact that it is for brand new ones makes it an even bigger slap in the face for consumers.
Not if the processor/mobo combo is 60% of the cost of a Intel heater.
What are we trying to do here, compute pi to 14 million decimal paces in 5 minutes or less?
Sooner or later AMD will come back. My experiences with Intel, is a soon as they get the lea
Re: (Score:2, Funny)
That's not that many digits... It wouldn't take even 5 minutes on a Core 2 Duo.
Oh please knock it off (Score:3, Insightful)
It is just silly to dredge up old crap and keep using it. It actually weakens any point you try
Re: (Score:2)
Re: (Score:2)
Gee, I just paid 20% more than I should have for my servers? You do know that it isn't chump change for someone running a small business. Seriously, do you understand how silly your statement is?
This is a serious problem for AMD. It's amazing how many hoops some people will jump through for fanboisim.
Re:Bummer - BUT I'M BUYING (Score:2)
Why?
Because I still have the choice of buying a broke-ass AMD processor instead of an Intel. If AMD folds, Intel might just give every employee a new Porsche just for kicks. Because with what they will be able to charge (in a monopoly business), the Porsches would be a rounding error.
Remember that Free Enterprise is the ultimate democracy. Vote with your dollars!
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
20% is generally the difference between the $1000 CPU and the $200 CPU.
Re:Bummer (Score:5, Funny)
Just thinking out-loud here: Did you trying pushing-in the Turbo button?
Re: (Score:2)
STACK OVERFLOW
SYSTEM HALTED.
Stupid Turbo button ruined my * Quest games countless times and unfortunately for me, most were practically unplayable at the default 4.77MHz. My next PC was a 486DX33 on which many of those games were now too fast to be playable.
Re: (Score:2)
-
Re: (Score:2)
Cue the intel jokes (Score:5, Funny)
Re: (Score:2)
Re:Cue the intel jokes (Score:5, Funny)
--------------
Intel's new motto: "United We Stand, Divided We Fall"
Q: How many Pentium designers does it take to screw in a light bulb?
A: 1.99904274017, but that's close enough for non-technical people.
Q: What do you get when you cross a Pentium PC with a research grant?
A: A mad scientist.
Q: What's another name for the "Intel Inside" sticker they put on Pentiums?
A: The warning label.
Q: What do you call a series of FDIV instructions on a Pentium?
A1: Successive approximations.
A2: A random number generator.
Q: Complete the following word analogy: Add is to Subtract as Multiply is to:
1) Divide
2) Round
3) Random
4) All of the above
Q: What algorithm did Intel use in the Pentium's floating point divider?
A: "Life is like a box of chocolates." (Source: F. Gump of Intel)
Q: Why didn't Intel call the Pentium the 586?
A: Because they added 486 and 100 on the first Pentium and got
585.999983605.
Q: According to Intel, the Pentium conforms to the IEEE standards 754
and 854 for floating point arithmetic. If you fly in aircraft
designed using a Pentium, what is the correct pronunciation of "IEEE"?
A: Aaaaaaaiiiiiiiiieeeeeeeeeeeee!
Q: Did you hear about the new "morning after" pill being developed as a
replacement for RU-486???
A: Its called RU-Pentium. It causes the embryo to not divide correctly.
TOP TEN NEW INTEL SLOGANS FOR THE PENTIUM
9.9999973251 - It's a FLAW, Dammit, not a Bug
8.9999163362 - It's Close Enough, We Say So
7.9999414610 - Nearly 300 Correct Opcodes
6.9999831538 - You Don't Need to Know What's Inside
5.9999835137 - Redefining the PC -- and Mathematics As Well
4.9999999021 - We Fixed It, Really
3.9998245917 - Division Considered Harmful
2.9991523619 - Why Do You Think They Call It *Floating* Point?
1.9999103517 - We're Looking for a Few Good Flaws
0.9999999998 - The Errata Inside
Worth a laugh anyway
"because", not "despite" (Score:5, Insightful)
Why does the summary claim this? I read through both articles, and AMD says this is a hardware issue across both chip models. Since this is a hardware issue, wouldn't it stand to reason that AMD would hold up a related chip because it's a hardware bug across both chip models and not because it's a clock speed issue? I'm not sure where the "despite" comes into play. I didn't see where the article said that AMD is not delaying a different speed Phenom.
Re: (Score:2)
Re:"because", not "despite" (Score:4, Interesting)
Re:"because", not "despite" (Score:4, Informative)
If it's a race condition in hardware, there's a good chance it's clock-sensitive. The bug probably exists in the whole line, sure. It'll manifest more as the clock ticks are closer together, because the margin for error without triggering the reversal of steps is smaller. If it's a matter of the wrong signal being sometimes being asserted because the edge of a clock line transition was missed, it's logically going to happen more when the clock cycles are shorter.
A bug being in the whole line regardless of clock frequency and that bug becoming more of an issue at higher clock frequencies are not at all mutually exclusive conditions. The higher frequencies and higher rates of the error may not coincide, but there's nothing in the article to logically say they don't.
The erratum probably does apply to the whole line equally but probably manifests as a percentage of the time in use as some function of the frequency.
For any geek wanting a basic understanding of issues like latching times, gate propagation delays, and other analog electrical signaling issues inside a digital CPU, I recommend the first few chapters of Structured Computer Organization [isbn.nu]. The book builds upon basic designs of computers from using TTLs to designing a CPU, then up by layers through microcode, designing an assembly language, and more. I have an older edition at home which covers up through the 68030 and the 80386 as examples. The newer one covers up through the Pentium II, the UltraSparc, and the Java chips. The book won't make you an electrical engineer by any means, but the discussions of the tricky timing issues within even simple CPUs might be useful here.
As for the clock speed not effecting the percentage loss in efficiency due to the microcode fix... well, yeah. The microcode is the same across the line regardless of the clock speed. If you insert two identical strings of instructions A1 and A2 into an identical pair of microcode stores B1 and B2, the resulting patched microcodes C1 and C2 will likewise be identical. The faster processor will decode and execute the microcode at the same clock speed as before, and so will the slower one. They'll each have the same percentage slowdown relative to their own clock speeds, because they're running the same microcode. We're not talking about two different generations of processors or even two different revisions. It's the same processor design at two clock speeds. One is going to get the same nerfs and buffs for any microcode change proportional to their clock speeds as the other.
Re: (Score:2)
In any case, I think the word despite is used in the quoted sentence because AMD is releasing flawed Phenoms with clock speeds lower than 2.4GHz. This would mean AMD is lying about the reason why they delayed the 2.4GHz Phenoms, since AMD's actions would establish that this erratum alone is not sufficient reason for them not to ship a particular CPU. So AMD is making a claim despite
Old issue, really (Score:4, Interesting)
Re:Old issue, really (Score:5, Informative)
AMD is in a world of hurt right now. The "true" quad-core line appears to be nothing more than marketing hyperbole since year-old q6600's are faster clock-for-clock than Phenom is. AMD will hopefully get these bugs ironed out... by next February. Even then though, AMD will have chips that are MASSIVELY expensive to make, but that they can't sell for the higher prices Intel is able to command. AMD would be fine if they had an expensive chip they could sell at a premium, or a very cheap to produce chip they could sell for the budget crowd, but right now they have Acura production costs coupled with Kia per-unit revenues: bad times.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Manufacturing is an important part of any high-tech industry. Improved manufacturing leads to lower costs and improved products. In the end, it doesn't really matter to Intel's big accounts and PC enthusiasts how they made their products better, but on more relevant measures (price, performance, efficency being the big ones.)
Re: (Score:3, Informative)
The design team in Israel added the MMX instructions into the last P5 and then worked on the ill-fated Timna design (integrated memory controller with RDRAM interface) while the P6 was ramping. After that they began the low-power d
Re: (Score:3, Interesting)
No, it's not marketing.
You're not seeing the usefullness on the desktop.
HPC is another story - and it's also the place that the plain old Opteron has been holding its own, against the faster, clock per clock, Core 2 microarchitecture.
Having requests go through the FSB (which is a WTF this day and age) kills cache snooping, etc, between cores.
The "true" quad core doesn't have this problem.
Re: (Score:3, Insightful)
AMD actually still rules the absolute low end of the market (and has for years). Semprons ($30+) and old X2s ($60+, new retail box) are dirt cheap, and it's simply not possible to get better performance per dollar [tomshardware.com].
There isn't much a $60 X2 can't do in your average deskt
Re: (Score:2)
You forgot Pinto.
-
Let's not forget.. (Score:5, Interesting)
that Intel's Core 2 also had a problem with the TLB when first released, although that problem manifested itself as data corruption instead of a lockup. Here are the two [theinquirer.net] articles [theinquirer.net] from The Inquirer about it - the second one especially. And note that this document was released after Intel had shipped the buggy Core 2's.
However, Intel was able to fix it without incurring a large performance loss. It's a shame for AMD that they weren't able to do the same.
Why all the secrecy? (Score:3)
Good thing they bought ATI (Score:5, Funny)
Re: (Score:2, Funny)
I swear, an implementation based on a five year old with a red, green, and blue crayon would probably satisfy a good portion of the GL spec...
</hyperbole>
Why AMD Released Faulty CPUs: Possible Theory (Score:3, Insightful)
The idea was to gain some cash to sustain operations until a faultless (i.e. no major faults) CPU can be released. Those that bought faulty CPUs will get their CPUs replaced as soon as faultless CPUs are completed. In some sense you can look at AMD's action as taking out a long term loan.
A counter argument to my theory can be that AMD would not risk its reputation to take out a "cash loan" in such a manner. However, the risk of losing reputation is justified if we consider another major factor at play: the holidays. It is less likely that AMD would gain the same (or even close to the same) cash flows if they would have released the CPUs after the holidays.
AMD now has some cash and is able to breath a little bit. When it releases fixed CPUs it will be able to continue where it left off.
Re: (Score:2)
A [expletive] Day in the [expletive] Life (Score:2)
[engineer]: Hmm
[engineer] runs [random load test] 10 more times gets 10 more [errors]
[engineer] calls [manager] "Um, I have something to show you..."
[manager]: [expletive]
[manager] calls [vp]: "We may have discovered an issue..."
[vp]: [expletive expletive expletive]
[vp] calls Hector Ruiz: "Hector, remember when you said the next person that called with bad news would be wearing your guitar around his neck?
Hector: [?]
[vp]: "Well," [explains]
Hector: [expletive
Perfect Linux CPUs (Score:4, Interesting)
The performance hit is probably 10% when patching the microcode which should mean steep price mark-downs on this generation of CPUs. But it's only a 1% performance hit when patching the (Linux) kernel.
So why doesn't every OEM that sells Linux servers and desktops just buy up all of AMD's supplies of defective chips at a big discount, and pass the savings along? I'd buy a couple.
Re: (Score:2)
Cant wait to see the big price drop on the old stock - and get myself a carton load of quaddies.
Imagine a beowulf cluster of
No. (Score:5, Funny)
Re: (Score:2)
No, but it looks bad (Score:5, Insightful)
They did (Score:5, Informative)
Re: (Score:3, Informative)
Now we learn that the slower parts were affected as well.
Re: (Score:3, Funny)
Re:No, but it looks bad (Score:5, Insightful)
It might not be AMD's doom, but they're really not that many big screwups away.
Re: (Score:2)
No, but AMD seems to be in a pretty delicate state. Their stock is pretty low and they've taken a beating from a newly-competitive Intel. They don't have a big advantage in processor speed anymore, nor power, nor even price. ...
It might not be AMD's doom, but they're really not that many big screwups away.
As far as most US consumers know (the ones that have any idea at all), there are just two companies that make CPUs: AMD and Intel. Mac users might know that Apple used to use IBM CPUs.
AFAIK, the worst thing that could happen to AMD is they get bought up, because I can't imagine any scenario where Intel literally becomes a monopoly in the consumer CPU market.
Re: (Score:2)
There are lots of things that could happen to AMD. If they do too poorly they won't be able to afford the R&D to keep up with Intel and they'll get relegated to the discount market from whence they came, or they could get bought out.
Re: (Score:2, Interesting)
I know this is Slashdot, but most people don't stuff stockings with CPUs.
I'm sure that any kind of computer being bought for Christmas has a CPU which was manufactured a minimum of a month or two ago. I'd actually guess that their processor sales slump somewhat in December and January because of surplus production of assembled computers in the previous months (and because at least some of the workers who buy CPUs and upgrade business computers
Re: (Score:2)
Oh well. At least I can buy a computer now and it won't be outdated for a bit longer.
Au contraire! (Score:2)
Re: (Score:2)
Wait, does this include the Electoral College? So Gore really was supposed to win?
:-D
Re: (Score:2)
This isn't (known to be) a security issue. Basically when the bug gets triggered, the processor just crashes. I guess you could carefully craft input to trigger it as a denial of service attack...
Re: (Score:2)
Yup. That's what we in the business call a "security issue."
Re: (Score:3, Informative)
Re: (Score:2)
I used to sing the same song, but it really doesn't apply any more. The parts of the chip that are genuinely X86 specific are quite small on a modern CPU. Consequently, a change in instruction set would actually have a very slight impact on a modern CPU, except for probably making it slower. If AMD had been using this design to build a souped up MIPS processor, there is absolutely no reason to suspect that they woul
Re: (Score:2)
Who wants to buy an unsafe / defective product? Even if the bug is unlikely to occur, I would not want to purchase a product with a known performance-impacting (fixed) or cr