Intel Blames 13th, 14th Gen CPU Crashes on Software Bug 59
Intel has finally figured out why its 13th and 14th generation core desktop CPUs are repeatedly crashing. From a report: In a forum post on Monday, Intel said it traced the problem to faulty software code, which can trigger the CPUs to run at higher voltage levels. Intel examined a number of 13th and 14th gen desktop processors that buyers had returned. "Our analysis of returned processors confirms that the elevated operating voltage is stemming from a microcode algorithm resulting in incorrect voltage requests to the processor," it says. But in some bad news, Intel still needs a few more weeks to test its fix for the problem. "Intel is currently targeting mid-August for patch release to partners following full validation," it says. The company also recently confirmed that the issue doesn't extend to its mobile processors.
Is damage permanent? (Score:4, Insightful)
Running CPU on unexpectedly high voltage degrades silicon over time. Since a lot of these CPU were likely being fed too much voltage for prolonged periods of time, they may be permanently degraded.
I guess intel is still trying to figure out everything involved in the issue, since this clearly was a difficult one to troubleshoot due to its intermittency. But it's going to be interesting to see how it will choose to make its customers whole, as its reputation has taken quite a beating because of this issue.
And we really need a healthy intel so that it continues to compete with AMD and both companies continue innovating.
Re:Is damage permanent? (Score:5, Interesting)
Extended warranty at a minimum.
I wonder though, is this really "too much voltage", or is it "we pushed the turbo boost too high to get better benchmark scores, and are going to drop performance a bit"?
Re: (Score:1)
Software problem: Anything that can be covered up by software.
There are a lot of hardware bugs that have to be worked around by software, but since hardware is 'hard' and software is 'easy' it becomes a software problem,
Re: (Score:1)
In some cases yes, it's permanent.
The question is, how do we know if our CPU is damaged or not or if a problem will show up later?
I noticed my 13th gen was unstable from the day I got it back in 2022.
Re: (Score:1)
Re: (Score:2)
You got a bad product and the company that made it isnt going to make it right
And get this, Intel amended their press release after the press got ahold of it, to admit to the hardware problem, after the news cycle... you are all stupid fucking sheep
Re: (Score:2)
The question is, how do we know if our CPU is damaged or not or if a problem will show up later
CPU Stress testing. Preferably both before and after the update.
The caveat is that a CPU is a complex beast with 12 billion+ transistors. It's possible that there can be damage, and the chosen stress test fails to thoroughly exercise or fails to expose incorrect operation of the portion which may be damaged.
Re: (Score:2)
And we really need a healthy intel so that it continues to compete with AMD and both companies continue innovating.
Intel hasn't been "healthy" since AMD64 was caused to be released by Intel's lack of health back in the early 2,000's. They are a lazy almost-monopoly, like Boeing... and just as healthy. If you are pinning your hopes on innovation from Intel, you are barking up the wrong tree sir. Monopoly profits are to be preserved, nobody cares about innovation.
You are better off letting Intel finish rotting than trying to save them. They literally can not think of anything other than profit at this point.
Not a design flaw? (Score:2)
Re:Not a design flaw? (Score:4, Informative)
Literally watch it here
https://www.youtube.com/watch?v=QzHcrbT5D_Y&t=0s [youtube.com]
https://www.youtube.com/watch?v=oAE4NWoyMZk [youtube.com]
Statement on how they would rate the procs if Intel didnt come out with a statement.
https://www.youtube.com/watch?v=gTeubeCIwRw [youtube.com]
Re: (Score:2)
Summary, there are holes and oddities in what Intel has stated.
https://www.youtube.com/watch?v=OVdmK1UGzGs [youtube.com]
Orly? (Score:2)
Re: (Score:3)
They are talking about the microcode, but software running in your OS.
Which?
Re: Orly? (Score:4, Informative)
Re: (Score:2)
glidographical error
^ nice! first i've heard this
i get bitten by "of" instead of "if" all the time (and of course many others some of which i can't even tell what i was trying to say when i read it a day or two down the line)
Re: Orly? (Score:1)
Re: (Score:1)
Yeah...I made that word up. Dunno what else to call it given it's a graphical (writing) error but it doesn't involve typing anything given you're using a glide keyboard.
I like "glidographical", but maybe chirographical error would fit too... https://en.wiktionary.org/wiki... [wiktionary.org]
Re: (Score:2)
Re: (Score:2)
Microcode is not software, but firmware that controls the CPU supplied by Intel.
Sure the Board and Software running on the computer each have an opportunity to apply Microcode updates supplied by Intel in order to address issues, and they do that during system startup , but that doesn't really change the microcode into software. This would be a hardware/firmware issue if the Microcode that came with the CPU or one of the microcode updates has this flaw.
Re: (Score:1)
Re: (Score:2)
I assumed they meant on the motherboard microcode.
Nitpick: Microcode does not reside on the motherboard but on the CPU. That said UEFI updates can be used to push new microcode to the CPU. You can also push new microcode to the CPU directly from software, typically this is done during boot by the OS to prevent it glitching out, though it can be done on the run too.
Re: (Score:1)
These days, part of "overclocking" is relaxing the power limits in the main board's firmware to the point where you could easily fry the CPU. That leaves the over-temperature sensors in the CPU as last safeguard. The other part is an extra capable cooling system, such as water cooling. For record attempts, people like Der 8auer sometimes even use liquid nitrogen.
Nothing Against Intel, but... (Score:2)
Nothing against Intel, but I am glad I recently switched and use an AMD Ryzen 9 7950X CPU. It works great for fully modded Skyrim and Baldur's Gate 3. I am glad we now have choices.
Comment removed (Score:5, Funny)
Re: (Score:2)
and there isnt a 'use more voltage this time' u-op
Sure, it's just a "software problem" (Score:4, Insightful)
If they admitted their designs were poorly engineered they would be subject to at least an 8 figure class action suit in the US which would have no defense. They can't admit fault and avoid liability with something like this.
Given that you can simply swap out a used chip for a brand new identical model chip and it works perfectly fine the damage is not going to be reversible. They're just gonna nerf the power numbers and leave us with a crappier chip because the ones which don't run at full speed (e.g. cheaper skus) do not degrade as quickly. This will result in fewer warranty replacements at the cost of chips that do not perform as advertised. They'll never admit that the slower chips are affected either because then the affected class will expand to the mainstream skus.
The benchmarks post-patch will let us know how bad the nerf is. Long term this problem will never go away.
Is this a permanent vulnerability? (Score:1)
If this is resolved with a software patch, then presumably the chips remain forever vulnerable to malicious software deliberately causing the same issue. In other words, potentially malware can now actually fry your processor.
I am not a chip designer or anything even close, but it seems to me that it would be kind of critical to have something like thermal and power limiting to be handled on-chip, in hardware, in a way that absolutely cannot be worked around with software. The hardware shouldn't be capabl
Re:Is this a permanent vulnerability? (Score:5, Insightful)
If this is resolved with a software patch, then presumably the chips remain forever vulnerable to malicious software deliberately causing the same issue. In other words, potentially malware can now actually fry your processor.
Most likely the problem is in the Intel microcode [wikipedia.org] not 3rd party code like in Cyber Punk 2077.
I am not a chip designer or anything even close, but it seems to me that it would be kind of critical to have something like thermal and power limiting to be handled on-chip, in hardware, in a way that absolutely cannot be worked around with software. The hardware shouldn't be capable of operating in a way that destroys itself physically.
Again, Intel is most likely talking about microcode.
Re: (Score:2)
And it's too hot to wear trench coats.
Re:Is this a permanent vulnerability? (Score:5, Insightful)
"Not a Defect: Intel Blames 13th, 14th Gen CPU Crashes on Software Bug"
That's PC Mag's headline. Now microcode comprises a vital part of any modern processor. Presumably the relevant microcode is supplied by Intel and ships with the chip - that is, in the chip or the driver.
Either Intel's own microcode is faulty, or it unwisely allows outsiders to reprogram the microcode of its processors.
In both cases Intel is at fault. If the trouble lies in Intel's microcode then it is indeed a defect. Software defects are just as real as hardware defects.
Re: (Score:2)
Not all defects are equal. A bug that can be patched isn't typically considered a defect, and up until now many have thought a hardware component was at fault. Microcode is Intel's responsibility. Bugs are common in every CPU. Intel, AMD, ARM, all publish errata for each chip generation usually listing upwards of 100 bugs in each CPU, many of them resolved silently by microcode updates as part of OS updates.
Re:Is this a permanent vulnerability? (Score:4, Interesting)
Re: Is this a permanent vulnerability? (Score:2)
Up until now? It's clear that hardware IS at fault, and that Intel is trying to hide the problem. They will issue new microcode which tries to do this, and which will degrade performance whether it succeeds or not. This is their fourth guess as to what the problem is!
Re: (Score:1)
It's clear that hardware IS at fault
No it's not clear in the slightest. If the fix is attributed to the microcode bug then it's a firmware fault.
This is their fourth guess as to what the problem is!
And if it's right this time will you eat an all you can eat buffet of humble pie? probable not, just come up with conspiracy theories from the sidelines. The only thing truly clear here is that nothing can be clear to you because you lack the information others have.
Re: (Score:2)
And if it's right this time will you eat an all you can eat buffet of humble pie?
That depends, will the performance plummet like the usual Intel fixes?
The only thing truly clear here is that nothing can be clear to you because you lack the information others have.
We all have the same information, except whoever inside Intel has been lying to us about whether they have a working fix. So far they've claimed four times that they knew what the problem was, and it wasn't their fault, and they've fixed it; We know they were lying three of those times, and we're waiting to find out for sure about the fourth. But since we know they were lying 75% of the time so far, the safest bet is that they're lying ag
no (Score:1)
Re: (Score:1)
Re: (Score:2)
Microcode loading/update at runtime is a thing : https://www.intel.com/content/... [intel.com]
Re: (Score:2)
In the dark ages it was common for big iron to load microcode from a floppy at every boot. The VAX 11/780 and tons of IBM mainframes for example. Even the newer IBM systems still have microcode loaded from a support element (laptop) at power up.
Bs (Score:2, Insightful)
The claims don't fit the failure patterns. We'll have to wait to see if these fixes actually solve anything. Also this bug should have been detectable with external voltage monitoring.
Re: (Score:2)
The 'fix' will let the processors operate just long enough to go out of warranty before the problem kills them.
Re:Bs (Score:4, Insightful)
Still the sort of thing that you'd think at least engineering samples would have test points for, or some sort of monitoring interface; but potentially a lot more fiddly than the original "it must be those dastardly gamers with their dodgy motherboards" theory of overvoltage.
Re: (Score:2)
Not every desktop Intel CPU has FIVR. Raptor Lake does had some variation of an IVR, but you can still monitor the VRM for power fluctuations that don't line up with voltage/current levels reported by on package sensors.
Intel always the salesman (Score:2)
Or, Intel might be telling us what we want to hear to buy more time.
Re: (Score:2)
Intel has been fucked for years so this is just a symptom. Intel is the next Motorola. You just cant compete as a vertically integrated fab anymore.
Re: Intel always the salesman (Score:2)
You just cant compete as a vertically integrated fab anymore.
Isn't that what Apple is doing with their M series of processors?
Bullshit (Score:1)
Re: (Score:2)
decoder breaks up instructions into micro-operations, u-ops. There isnt a "do this one extra hard" u-op.
Re: Bullshit (Score:2)
do we know intel microcode. Is that public information?
When you use vector processing the processor underclocks to compensate for additional power draw. may be that need to be able to adjuste voltage?
Re: (Score:3)
Microcode is more than just rearranging instructions. It controls the entire function of the CPU which includes how power is drawn.
Ridiculously misleading title and story (Score:2)
I get that Slashdot uses the headlines that come from the source of a story, but anyone who thinks about it should realize that just because you can fix a problem by changing things in software, that does NOT necessarily mean that the issue itself is in software.
Intel is addressing the issue in microcode. That doesn't mean it's a "Software Bug". We're techies and should know better than to post this kind of lowest-common-denominator story.
Wow - they are testing the fix (Score:3)
Maybe Intel should invite should invite some people from ClownStrike to show them how to do QA.
Not software (Score:4, Insightful)