Intel Reveals Itanium 2 Glitch 250
NeoChichiri writes "News.com is running on an article about glitches in Intel's Itanium 2 chips. Even though it doesn't affect all chips, they have still stopped shipments of the new 450 Servers until the problem is resolved. Apparently it has to be 'a specific set of operations in a specific sequence with specific data.' Intel is saying that affects the 900MHz and 1 GHz Itanium 2 chips and that it will not affect the upcoming 1.5 GHz Itanium 2 6M chips." Until the next iteration of chip arrives though, Oliver Wendell Jones writes, "they recommend working around the problem by underclocking the processor to run at 800 MHz instead of its default 900 MHz or 1 GHz."
Aptly named... (Score:4, Funny)
Re:Aptly named... (Score:2, Funny)
Glitch? (Score:4, Interesting)
Mmmm (Score:4, Funny)
Underclocking..? (Score:3, Funny)
Microcode? (Score:5, Interesting)
On a side note -- who exactly didn't expect something like this? Intel has a history of this sort of thing -- from the 80486DX not being able to add properly, and IBM having to halt shipments of PS/2 machines; to the Pentium F00F bug and others. Buying first run Intel chips is like playing dice with your business. Give them a few production runs to work out the bugs...
How about others (AMD, Mot, IBM) (Score:5, Interesting)
Of course when it happens to Intel, then EVERYBODY knows about it. My question is, how prevelant is this sort of thing throughout the cpu industry? Anyone know of other "mistakes" by the other major players? It's hard to imagine that only Intel makes these kinds of goofs, esp. with the complexity of todays chips. As an example, wouldn't Mot's failure to scale up the G4 PPC chips be considered an "error"? They just caught it early enough to not to ship any chips and say "oh, we're sorry, our G4's won't go as fast as we originally stated, wait another year and a half or so and we'll get it all sorted out". Didn't they also do a similar thing with the 68040?
Re:How about others (AMD, Mot, IBM) (Score:2, Interesting)
I think Motos problem is they're too busy making cell phones to worry about PPC.
Re:How about others (AMD, Mot, IBM) (Score:5, Informative)
Re:How about others (AMD, Mot, IBM) (Score:4, Informative)
Running a normal Linux or NetBSD on one of these machines is asking for pain however,.
Re:How about others (AMD, Mot, IBM) (Score:2)
Re:How about others (AMD, Mot, IBM) (Score:2)
What do you know about how much Apple pays to Mot? Mot's processor business is just that - their own business - They are responsible for what they decide to charge to Apple.
Re:How about others (AMD, Mot, IBM) (Score:2)
Re:How about others (AMD, Mot, IBM) (Score:2, Informative)
Fortunately, that was just a supplier issue, where IBM was giving Sun bad cache RAM. This problem certainly caused a lot of unhappy customers, but it was a straight-forward resolution compared to fixing or patching the CPU itself.
I've read that the UltraSPARC CPUs themselves tend to have very low errata rates, like a half dozen or so for the UltraSPARC II compared to dozens for Intel's Pentium chips. This is probably the result of Sun's long development and testing cycles,
Re:How about others (AMD, Mot, IBM) (Score:2)
Intel (32-bit) systems==Cheap
Sun Systems==Reliable
Intel Itanic Systems==?
Re:How about others (AMD, Mot, IBM) (Score:2)
Re:How about others (AMD, Mot, IBM) (Score:2, Informative)
Re:How about others (AMD, Mot, IBM) (Score:2)
Re:How about others (AMD, Mot, IBM) (Score:2)
Very prevalent. A recent /. story spoke of MMU bugs in the 68K series. Ultrasparc CPUs have had cache corruption bugs. I know somebody who was frustrated (for several weeks!) by a register corruption bug in a microcontroller. These bugs are sometimes "fixed" by changing code-generators (eg, compilers) to avoid problematic sequences.
I'm not very familiar w
Re:How about others (AMD, Mot, IBM) (Score:2)
Most chipmakers don't always me
Re:Microcode? (Score:2, Informative)
Best option is not to restrict yourself to certain "runs" but to just see the performance of a run yourself. The aforementioned PPC601 w
Not ppc603s (Score:5, Informative)
That's a very very big reinterpretation of the facts. ppc603 machines were designed for low cost low heat. One of the ways to do this was to further remove instructions that were not needed, legacy instructions from pre-PPC601, and were never designed to be in the 601. They were not 'critical' and did not cripple anything. ppc603 cpus ended up working just for the purpose they were designed for. cheaper and less energy-hungry cpus.
the G3 floating point debacle where excel spreadsheets would show up errors consistently
You made a typo there. "Pentium" is not spelled "G3"
Re:Not ppc603s (Score:2)
Show me where I was "all over them (intel) for shoddy quality".
You seem quite heated up about some little presumptions you made up in your own mind about what I said. Do you often get in this state of arguing with your own misconceptions?
Re:Microcode? (Score:2)
Re:Microcode? (Score:4, Insightful)
I suppose the same argument can be applied to everything in life. Cars, Televisions, DVD players.. you name it. You just need to get a feel for how things age before you invest in them for long term.
Problem is the Hardware (re:Microcode?) (Score:2, Informative)
In terms of reliability, the Itanium II is no worse than the UltraSPARC series of chips. Both Itanium and UltraSPARC face the daunting task of debugging 100+ million transistors. Ensuring that the fabricated chip is bug free is virtually impossible. So, both companies have substant
Re:Problem is the Hardware (re:Microcode?) (Score:5, Insightful)
Please read "Sun suffers UltraSparc II cache crash headache [theregister.co.uk]"
This was a problem with the cache RAM and not the CPU itself. It was traced to a supplier (IBM), who was selling a defective product.
In terms of reliability, the Itanium II is no worse than the UltraSPARC series of chips.
There is no data to back this up. I know you don't have it, and I certainly don't have it. The only people who really have it (Intel and Sun) probably won't give it to us, so this ends here.
However, since so many people pay attention to the flaws in Intel chips, they are likely to have less bugs than other chips.
This is not true. Intel is pressured by a time-to-market more than other suppliers, especially with respect to the Pentium line. Sun has obviously decided to delay product launches to work out issues (e.g., UltraSPARC IIIi), because their customers expect reliability over other concerns. Hardware doesn't really follow the "all bugs are shallow" mantra of the Open Source movement, we mainly have to have faith in the manufacturer's simulation and test labs.
In any event, the performance of the Itanium II is at least 1 order of magnitude greater than the UltraSPARC III and (soon) IV.
Do you even know what "order of magnitude" means? You are claiming that, if the UltraSPARC III scores 975 on something that the Itanium II would score 9750??? For a given clock, it is true that the Itanium II is faster than the US III, but by a fraction--not a factor of ten!
Also, the US IV, by definition, will be almost twice as fast as the US III for throughput, because it is two US III chips in one.
You really don't know what the facts are.
Re:Problem is the Hardware (re:Microcode?) (Score:2, Informative)
Also, the US IV, by definition, will be almost twice as fast as the US III for throughput, because it is two US III chips in one.
First, UltraSparc IV will be a Out-of-Order CPU. Any comparisson with the In-Order UltraSparc III ends here.
Second, "two chips in one" is misleading. It will be a CMP chip: multiple cores on one die, sharing external interfaces and higher levels of cache.
Thirdly, the performance gain of doubling the number of cores per die (or the number os CPUs in a system) doesn't mean it
Re:Problem is the Hardware (re:Microcode?) (Score:2)
Okay, dual cores is more accurate than dual chip.
Thirdly, the performance gain of doubling the number of cores per die (or the number os CPUs in a system) doesn't mean it can provide twice the throughput.
For a large number of applications, it can, and the Solaris kernel's fine-grained threading improves the odds greatly. For applications that saturate the processor's external bus, then it is certainly possibl
Re:Performance: Itanium 2 vs. UltraSPARC III (Score:3, Informative)
And it's a far cry from the "order of magnitude" better performance than the grandparent post's claims.
What's really funny about this post is that normally I am the one bashing Sun's CPUs... *boggle*
Obligatory AMD note: the new SPEC update today shows that a 1.8GHz Opteron SPECint base is 1081.
On a price/performance basi
Re:Microcode? (Score:2)
I believe all of this is related to our greedy decisions to release products prematurely. Afterall we're only making these products so we can make money. I doubt a single exec at Intel cares
Re:Microcode? (Score:2)
Re:Microcode? (Score:2)
Re:Ummm, microcode is burned in, dude. (Score:2)
Doesn't affect all chips... (Score:5, Funny)
Re:Doesn't affect all chips... (Score:3, Funny)
In all seriousness... (Score:2, Funny)
Re:In all seriousness... (Score:2)
I've heard almost nothing about AMD's 64bit chip, and I'd still rather buy *it* than Intel's offerings.
Actually, I'd still much rather have a new Alpha system. Excuse me while I drool. I guess Intel can't even compare to the chip they based their own new chip on.
Wrong figure (Score:3, Funny)
No, all 6.666666666666666666666 people.
Re:Try a few hundred.... Check these out... (Score:2, Funny)
Re:Try a few hundred.... Check these out... (Score:2, Funny)
Itaniums are so powerful, all those companies run off just six chips.
Oh... apparently this bug means the bottom 4 companies have to wait until Itanium 3 for 64-bit computing, sorry guys.
Deja Vu (Score:3, Interesting)
This sounds similar to the way they described the floating point divide error in the original pentium. How long until they start giving odds on the chances of someone seeing the problem in normal use.
Jason
ProfQuotes [profquotes.com]
Re:Deja Vu (Score:5, Funny)
Re:Deja Vu (Score:2)
I seem to recall that they almost lost the company the last time they made a mistake like this with a popular CPU.
Oh, sorry, did you say the Itanium? Well, I suppose they'll probably just send replacements out to both customers then.
Um, what? (Score:2)
good time for an ad (Score:2)
another chip problem. (Score:3, Funny)
Underclock? (Score:4, Interesting)
Why not just buy the lower-clocked CPU's then? Will Intel replace the crap chips when a revision with a fix comes around?
"If the customer feels it's the right solution, we'll exchange processors with ones that aren't affected," she said. Intel has developed a simple software test that can determine whether a chip is affected. Meaning what? Lower-end chips that aren't aaffected, or a fixed version of the same chip. If it's the same chip, who wouldn't think it is the right solution? The article doesn't indicate whether the problem is actually solved either, but that it seems to be somewhat of an anomaly that doesn't affect all chips.
Not a good day for Intel, and probably another reason why you don't immediately need that "Newest on the shelf" CPU, whether for your home machine or a server. Besides, by the time this chip is assuredly fixed, a faster revision will probably be out at a comparable price.
Re:Underclock? (Score:5, Funny)
Your geek membership has been revoked. Hand in your pocket protector at the door. OutOutOut!
Alternative to underclocking (Score:5, Informative)
Obviously, Intel are not going to encourage people to increase the voltage of their processors in order to run them at the default speeds, as this can run the risk of thermal damage to the chip with insufficient cooling, or overly high voltages. It may however still represent an option for system administrators who are keen to retain the performance of the chip.
I'm actually pretty impressed (Score:4, Interesting)
Re:I'm actually pretty impressed (Score:4, Insightful)
Re:I'm actually pretty impressed (Score:5, Interesting)
This kind of bug is a little different though, we're not talking about a stuck gate that only gets tickled during a single ALU operation or retiring an instruction too early or bigfooting a register too early or anything like that. We're talking about clocking issues and fundamental timing issues in Intel's "server grade" platform. There are accepted standards and practices for how aggressive to be, some vendors can tell you with amazing detail how reliable their chips are, in what conditions, etc.. With clocks in particular some vendors can be picky, I've seen hard hitters scope up boxes and refuse to support hardware they sold because it was clocked out of spec (think about the edge of a clock and clock quality.. a 1.2 Ghz clock isn't enough, it has to actually achieve the level of the clock before it switches back and it takes time for the clock to transition..) it sounds like Intel is either ignoring them or trying to write their own book or the IA64 is a bigger disaster than any one there wants to even hint at. There are a fairly limited class of errors where underclocking the chip fixes the problem and most of those errors are related to the chip being aggressively clocked to begin with. It's ironic, on IBM's POWER4 line of processors they added extra cache room for parity (at the expense of potential performance) and made the leads more beefy (again at the expense of higher clock speeds) because the platform is a server platform that places reliability at a premium. It sounds like Intel has been making PC chips too long and isn't ready for server grade chips.
Their party line has been that they will keep working at it until it's ready, they aren't expecting it to move a lot of chips, etc. etc.. Right now they have walked down a road where they have invested billions? (at least hundreds of millions) in an unproven technology. They have crossed the line to the point that there won't be $1500 IA64 products for years and years. They have piped it as a server grade platform. And it underachieves in every area and has't taken the world by storm nearly as much as they said. So bad is it that HP, their blood brother in that mess has continued the PA-RISC and Alpha lines past the point they claimed when they originally adopted the IA64. The only reason I could imagine them to aggressively clock it like that have would be because that's the only way to make it perform remotely like they have claimed it would. I'm not going to guess about Intel's dirty laundry but I'd guess the stakes are little higher than it would look on the surface for the IA64, either that or there are some incompetants running the show.
Re:I'm actually pretty impressed (Score:2)
I really meant that all the people DO respect Intel.. ;-)
I wonder if that's what caused this crash (Score:2)
Hmmm...wonder if BMW is using these chips [news.com.au]?
Re:I wonder if that's what caused this crash (Score:3, Informative)
Re:I wonder if that's what caused this crash (Score:2)
Re:I wonder if that's what caused this crash (Score:2)
makes it hard if you need to turn down the radio while you're messing with the A/C, as I understand it.
Also, it means you can't replace the stereo with an aftermarket one,
Re:I wonder if that's what caused this crash (Score:2)
2003-05-12 15:26:10 Man trapped in car after computer fails (articles,news) (rejected)
(Okay, so I'm petty. At least I got it in despite the editors rejecting it.)
Ironic? (Score:5, Interesting)
Well, the chip is more important (Score:2)
It's the main component of a computer. Besides, for software, it's much easier to update (bug fix). If your processor is messed up, it's a lot worse.
Re:Ironic? (Score:2, Insightful)
With hardware like a proccessor, you'd most likely have to actually replace the part that's broken.
I agree that software companies should be held to a higher standard, but they can get away with it because the bugs are easier to fix.
Re:Ironic? (Score:2)
Re:Ironic? (Score:2)
I guess you haven't heard of microcode patches...
And has AMD had even close to as many bugs as intel?
Yes. Every CPU on the market has bugs. I remember Palimino had a nasty bug with cache coherency that AMD was reluctant to fix. The only difference is that Intel is on a lot more systems than AMD, so it is a lot more noticable.
Re:Ironic? (Score:2)
I've heard they dn't work for a lot. Is this one? They talk about having to provide replacements, so i assumed it wasn't.
Yes. Every CPU on the market has bugs. I remember Palimino had a nasty bug with cache coherency that AMD was reluctant to fix. The only difference is that Intel is on a lot more systems than AMD, so it is a lot more noticable.
Ok, thx for the info. I guess this is the downside for intel of getting all the press.
Re:Ironic? (Score:3, Insightful)
Everyone expected WinXP to be crap, and they were so relieved that it wasn't as bad as they thought they forgot to complain about the problems that do exist, as evidenced by the number of people who say "WinXP is great, compared to Win98 it's very stable and pretty fast, even though I did have to buy a new PC to run it, but that's just progress, isn't it?" when you ask them what they think of it.
Re:Ironic? (Score:2)
going a bit offtopic but that is progress isn't it ?. XP is a desktop OS that used by people at home a lot. people want their pretty colours, games and multimedia features. I'm not saying that MS software isn't a little bit bloated but it is not as bad as linux peopl
Re:Ironic? (Score:5, Insightful)
Re:Ironic? (Score:4, Insightful)
While I do understand your sympathy towards hardware manufacturers, there is one obvious difference between accepting software and hardware bugs. The software bug can be fixed with a patch. The $200 software now works; we can accept that. When the CPU is buggy, the only way that gets corrected is if the manufacturer is willing to replace the CPU. BIG difference.
I agree completely that software products should be set to a higher standard. But we haven't seen integrity in the industry, so all that's left to fix the problem would be to sic the lawyers at them. I don't see that as fixing the problem...
Re:Ironic? (Score:2)
Re:Ironic? (Score:2, Informative)
Re:Ironic? (Score:5, Informative)
The number of states is 2 to the power of the numbers you were talking about. Even if I take the lowest number ("a couple dozen Kbytes") that you mentioned, it's 2^2*12*1024*8 = 2^24000.
Guess what?
That's a HUGE number -- way bigger than the "billions of petabytes" you were saying is impossible to recreate for software testing. It's roughly equivalent to 10^7200 (if that somehow makes things easier for you). Of course, the "couple dozen Kbytes" is a massive underestimation of the total state of a modern CPU (100 million transistors, even just making flip-flops will give 2.5M bits of state, and for 6T SRAM more like 16M bits).
And then you have the nice problem that physics and electrical phenomena play havoc with hardware testing simulations, as opposed to software, which only has to worry about bad boolean logic.
Come talk to me next time you have to worry about alpha-particle hits changing the state of any of your code or when you care about any event with picosecond granularity (which is just about every day in hardware).
Yes, software testing has even more states to worry about, but trust me when I tell you that the hardware problem is plenty big enough to prevent exhaustive testing from being applicable. Hardware testing uses a lot of brute-force regression and detailed test planning to find and remove bugs. Software folks would do well to use such methodologies.
Re:Ironic? (Score:3, Interesting)
I'd like to add another problem with testing. How do you know if the processor is giving the correct answer??? Work it out by hand??? Test it on another processor that may or may not have the same design flaws??
Re:Ironic? (Score:3, Informative)
Also, true exhaustive testing is not just about testing all opcodes by running all of them, it is about testing all opcode
Re:Ironic? (Score:2)
Hey Intel! (Score:5, Funny)
You know...if you're looking for anyone that is.
Bad Joke. (Score:3, Funny)
I don't know but will let you know when it gets there
OMG I don't believe I just wrote that
rus
Pentium FDIV again? (Score:2, Funny)
Possibly timing or power related (Score:5, Informative)
There isn't much detailed information about the exact conditions that bring out the bug, but they do state that the bug is electrical, that some unspecified combination of instructions and data pattern are needed, and that reducing the clock frequency avoids the problem. I can think of several things that might cause the bug. These are just guesses.
One possibility is that there is a slow timing path in the logic that is marginally meeting the 900MHz or 1GHz clock speed. Going to 800 MHz gives the slow path more margin. This is the easy answer.
Another possibility is that they have some part of the chip that has insufficient metal to deliver power to the logic gates. The right combination of activity might cause enough voltage droop to cause logic errors. Slowing the clock reduces the power consumption in CMOS chips.
They might have a crosstalk problem between some signals that could flip bits when the right activity and frequency are combined. Slowing the clock can shift the relative positions of signal transitions.
Eventually more details might surface, but Intel is probably keeping it quiet so that people don't write code to maliciously crash servers.
HAL-9000 on Itanium (Score:5, Funny)
"I'm sorry, Dave. I can't do that
Mwhahaha (Score:5, Funny)
zero tolerance for undetected corruption...? (Score:5, Funny)
so detected data corruption is just fine, then...?
anti-overclocking patent (Score:3, Funny)
Hmm... (Score:3, Funny)
Intel Inside: Get 99.98765374% from your PC!
Instead, it's now:
Intel Inside: Get 99.98765374% from your
Wow this is bad (Score:3, Funny)
Geesh, Give Intel a Break (Score:4, Interesting)
The type of problem Intel is dealing with could very well be in a new class. I have a hunch that it has to due with either unexpected capacitive coupling ( possibly related to an in-spec extreme of the process variation) or thermal transients causing timing skew. These types of phenomena are nearly impossible to model, especial if its tied to a particular set of process deviations. That is why manufacturer do such extensive qualification testing. Unfortunatly this testing can not be done untill there are enough units to test ( like in the 1000s). This does not happen untill the device is ready for production. Technicaly, this is the Pilot phase of development.
One needs to give Intel some credit for learning a lesson from the Pentium fiascos ( not just the math error, but also the original ( 5V) 90Mhz burn-up issue). At least they are doing the right thing now. Corporations, like people, sometimes need to learn the hard way. Unfortunatly, though people usually retain their lessons, Corporations sometimes need to relearn them, especialy when being run by greedy BODs ( or board members with hidden agendas). AMD has yet to learn this particular lesson. One of these days, they will try to cover up a problem and its not going to work. They have gotten away with some stuff already because everyone loves to hate Intel ( me included, 68000 and PowerPC for me!)
Unless your familiar with LSI semiconductor manufacturing, you should not be commenting. Because you don't have a clue as to what is going on. The posts I've read so far, remind me of what a class of 10 year olds would right in criticing Joseph Conrads "Heart of Darkness".
Re:Geesh, Give Intel a Break (Score:3, Insightful)
>>AMD has not had this sort of problem resently
Does that mean they're jealous of Intel's problems and resent not having them?
>>The posts I've read so far, remind me of what a class of 10 year olds would right in criticing
Wow. With your mad spelling and grammar skills you ought to know exactly what 10 year olds are capable of.
But seriously though, Intel sells these chips to a completely different ma
Give ME a break (Score:3, Informative)
Additionally, while the Itanium instruction set takes a diff
Re:Give ME a break (Score:3, Insightful)
Re:Give ME a break (Score:2)
Re:Geesh, Give Intel a Break (Score:2)
In that case we need to change the gravitonic phase, reduce the tectronic radiation and then increase the nucleonic flux.
That's why I demand genuine Intel Inside... (Score:2, Funny)
Makes you wonder.... (Score:2)
They recommend working around the problem by underclocking the processor to run at 800 MHz instead of its default 900 MHz or 1 GHz
I just want to see them recommend this AFTER they start incorporating their new patented anti-clock speed changing technology into all of their chips.
Re:Makes you wonder.... (Score:2)
Intel QA (Score:3, Funny)
Glad they got the info out (Score:4, Funny)
Does the bug fry the hardware? (Score:2)
Sans the liquid varients, there doesn't seem to be any such thing as 'adequate' cooling on an AMD T-Bird in Texas during the summer. Sure, the last few AMD processor generations seem relatively bug-free, but what's the point of a 'flawless' processor if it only lasts me a year?
Sigh.....*waits
Re:Does the bug fry the hardware? (Score:2)
Re:could this be it? (Score:2)
Of course.
Re:que? (Score:2)