Motherboard Makers Apparently To Blame For High-end Intel Core i9 CPU Failures (arstechnica.com) 57
An anonymous reader shares a report: Earlier this month, we wrote that some of Intel's recent high-end Core i9 and Core i7 processors had been crashing and exhibiting other weird issues in some games and that Intel was investigating the cause. An Intel statement obtained by Igor's Lab suggests that Intel's investigation is wrapping up, and the company is pointing squarely in the direction of enthusiast motherboard makers that are turning up power limits and disabling safeguards to try to wring a little more performance out of the processors.
"While the root cause has not yet been identified, Intel has observed the majority of reports of this issue are from users with unlocked/overclock capable motherboards," the statement reads. "Intel has observed 600/700 Series chipset boards often set BIOS defaults to disable thermal and power delivery safeguards designed to limit processor exposure to sustained periods of high voltage and frequency."
These are the specific settings that Intel believes are causing problems:
Disabling Current Excursion Protection (CEP)
Enabling the IccMax Unlimited bit
Disabling Thermal Velocity Boost (TVB) and/or Enhanced Thermal Velocity Boost (eTVB)
Additional settings which may increase the risk of system instability:
Disabling C-states
Using Windows Ultimate Performance mode
Increasing PL1 and PL2 beyond Intel recommended limits.
"While the root cause has not yet been identified, Intel has observed the majority of reports of this issue are from users with unlocked/overclock capable motherboards," the statement reads. "Intel has observed 600/700 Series chipset boards often set BIOS defaults to disable thermal and power delivery safeguards designed to limit processor exposure to sustained periods of high voltage and frequency."
These are the specific settings that Intel believes are causing problems:
Disabling Current Excursion Protection (CEP)
Enabling the IccMax Unlimited bit
Disabling Thermal Velocity Boost (TVB) and/or Enhanced Thermal Velocity Boost (eTVB)
Additional settings which may increase the risk of system instability:
Disabling C-states
Using Windows Ultimate Performance mode
Increasing PL1 and PL2 beyond Intel recommended limits.
JayzTwoCents covered this (Score:4, Informative)
They did a video on this issue before the problems started coming out and did a follow-up on it afterward.
Can confirm from personal experience. (Score:2)
I purchased an Alienware M18 with i9-13980HX processor and Nvidia 4090RTX.
After a 3 months, it started experiencing frequent kernel panics, reboots, and blue screens.
The issue was eventually solved by replacing the motherboard (C9XMR) and heatsinks.
Re: (Score:3)
The article is about default BIOS configurations that cause the issues. Replacing the motherboard would give you the same default BIOS configurations, so you'd have the same crashes with the new motherboard.
Re: (Score:1)
The article is about default BIOS configurations that cause the issues. Replacing the motherboard would give you the same default BIOS configurations, so you'd have the same crashes with the new motherboard.
Not all motherboards have the same default BIOS. Replacing a motherboard and getting the exact same defaults would mean you used the exact same model of motherboard... There are many manufacturers.
How did that escape you?
Re: (Score:1)
Maybe you should reread the comment GrumpySteen was replying to, I don't think Alienware is going to send out random brand motherboards to fix one of their systems.
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
Maybe you should reread the comment GrumpySteen was replying to, I don't think Alienware is going to send out random brand motherboards to fix one of their systems.
Even if the motherboard you receive after an RMA has the same hardware revision (likely it doesn't) it almost certainly won't have the same BIOS unless you updated the BIOS right before sending yours off.
Re: (Score:2)
The article is about default BIOS configurations that cause the issues. Replacing the motherboard would give you the same default BIOS configurations, so you'd have the same crashes with the new motherboard.
It's entirely plausible that the replacement motherboard may have been shipped with an updated BIOS that has more sane defaults. Reading TFA, this appears to be the direction that Intel is pushing the motherboard manufacturers in.
Re: (Score:2)
It's also possible that the replacement motherboard was somewhat luckier in the silicon/manufacturing lottery. If it's being pushed to the limit, a board that was a bit above spec rather than just at spec might work. Presumably the MB manufacturers sort of tested the bios settings -- but they may have done it on a different run/factory/phase of moon board than was actually shipped.
Re: (Score:1)
Congratulations on your terrible purchase.
Re: (Score:2)
For information, how did you reach conclusion it was the motherboard, and not the RAM or the processor? I'm just concerned it might happen to me one day and I'll never be able to understand where the problem comes from.
Re: (Score:2)
Re: (Score:3)
purchased an Alienware
well theres your problem lady!
Thank You Adam Savage
Re: (Score:2)
Article is about intel's approved rules and bios's bricking CPUs, not boards failing due to inadequate VRM's. If you had the same problem, then even after replacing the board your problem would persist, except that board likely is ewaste-by-design everything soldered on-board so you accidentally replaced the whole system.
Re: (Score:2)
Same family of CPUs but different product with different problems. The culprit is Z790 motherboards with CPUs ranging from 13900k-14900ks. Products you don't have with wildly different power draw.
was pretty pleased until the 29th day... (Score:3, Interesting)
I walked out with an AMD zen 5 series as a replacement.. also asus... with a natty oled screen... love it.... Never looked back. AMD zen seemed 20% faster, no heat issues, can pin it a 100% for as long as I want, no problem... and battery life is easily 25% longer.
Also, don't run systemd, you'll gain another 5% battery life from that alone. <whispers.... devvv....ooo...uan.....>
There is for certain, no use case for systemd on a laptop/desktop install. NONE.
There is also no use case for systemd on a server, you're just Red Hat's bitch.
Sorry, I meant, IBM's bitch.
Re: (Score:2)
Re: (Score:2)
I walked out with an AMD zen 5 series as a replacement.. also asus... with a natty oled screen... love it.... Never looked back. AMD zen seemed 20% faster, no heat issues, can pin it a 100% for as long as I want, no problem... and battery life is easily 25% longer.
I would like to congratulate you on your sample size of one purchase yielding good results. Meanwhile people who actually keep up to date with news know that AMD is not immune to fiery CPUs. https://www.pcgamer.com/users-... [pcgamer.com]
Re: (Score:2)
That issue was solved months ago, and it was only people running excess vSoC settings mostly due to EXPO memory kits. We've already covered the matter in comments here at least once.
Re: (Score:2)
whoopsie....try to keep up with the news, eh?
I appreciate your compliment though.
Re: (Score:2)
Interesting. We have been using Raspberry Pis for stuff at work, and tried Devuan. We were mostly interested in power consumption, what would be battery life for a laptop. We found that Devuan was a lot worse.
To be fair it could have been the Pi hardware, it could have been Devuan not being optimized for it, but it was considerably worse than RPi OS which uses systemd.
Re: (Score:2)
Re: (Score:2)
Back a few years I was wondering why Mint, being glorified Ubuntu, ran so much better than Ubuntu. Turns out Mint was running (by actual count) 1/4th as many processes. Gee, I wonder how that could impact performance...
I didn't much like Devuan until they borrowed the PCLOS desktop and general way of doing things... now it's a lot slicker.
Sure (Score:3, Interesting)
It's the fault of the motherboard makers for using the chipsets exclusively allowed by Intel with a bios explicitly approved by Intel while following the rules drafted by Intel.
The common failure when all your decisions must be approved by some outsider is to stop doing your own oversight. Of course the board makers should have done better, but it's 1000% intel's fault for failing to use their position to actually protect their products and customers.
Re: (Score:1)
That’s a bit like Ford being responsible for you driving on the interstate in first gear at the red line.
Re: (Score:2)
As an early beta-tester (read: customer) of Ford's attempt at making a CVT transmission, please stop giving them ideas.
Re: (Score:2)
Re: (Score:1)
You don't. These CPUs aren't outright dying by burning up. What's happening is that they're degrading too fast. So they'll still work, but be stable only at slower clock speeds and power limits.
Re: (Score:2)
That's a great car analogy because it involves a car. Here's a better one.
Ford: Buy our 500hp engines, it will allow you to drive 200mph!
Dodge: We plan to hit 210mph by using Ford's engine. We are going to run it at 50,000rpm and will be saving costs by using no radiator.
Ford: Sounds great, we will tell everyone to buy a Dodge!
Re: (Score:2)
Re: (Score:2)
Winning benchmarks.
Re: (Score:2)
Re: (Score:2)
It's the fault of the motherboard makers for using the chipsets exclusively allowed by Intel with a bios explicitly approved by Intel while following the rules drafted by Intel.
Nice story, but in this case the issue was specifically *not* following the rules drafted by Intel.
Re: (Score:2)
We know that the boards were compliant because Intel won't sell chipsets to anyone who doesn't follow their rules. Hardware Unboxed was exercising extreme diligence to confirm that Intel hasn't specified which power limits they're ok with. The whole market stack is captured, which unfortunately for intel leaves them with nobody to blame but themselves.
The only way intel could escape liability for this problem would be if some of the board makers were falsifying data to pass the conditions (like VW with th
Re: (Score:2)
The problem is that the default settings are outside of the max set by Intel. They are NOT following the rules
Re: (Score:2)
Re: (Score:1)
It's actually more of intel defining the spec poorly. Intel has gone on record to say that their standard settings are just a recommendation, and running the CPU in infinite length turbo frequency and blasting power at it is still "in spec" because... spec only addresses overclocking via clock speed. Not increasing power fed to CPU.
And feeding too much power to CPU for a long period of time causes rapid and permanent degradation of silicon. Which is what is happening here.
To make matter worse, Intel is so f
Re: (Score:2)
And because the reviews were all done with the over powered chips, Intel has effectively been false advertising to all end users.
Intel owes everyone a refund.
Re: (Score:2)
The important issue here is who is liable, i.e. who is going to buy you a new CPU.
Have any of the motherboard manufacturers said they will replace dead CPUs?
Re: (Score:2)
That is a terrible way to look at it.
The fact that Intel left room for "tweaking" settings is no excuse for the motherboard manufacturers to set up their BIOS/EFI in such a way as to fry the components.
Your perspective allows no end room for the users. I don't like that even though I don't overclock.
MSI (Score:2)
MSI is recommending users turn off CEP as it can cause CPUs to overheat.
https://www.msi.com/blog/lower... [msi.com]
Intel says CEP should be turned on.
So which is it? My i9-13900KF is plenty fast without overclocking or undervolting, I mainly want it to be as stable as possible.
Re: (Score:2)
MSI was one of the makers pushing infinite power on all their boards and this advice is not current. I'd expect their aberrant results to be an indicator of some other configurations that are quite unreasonable.
Current (Score:2)
This post was from last month, and the latest BIOS, released last week, for my mainboard adds this setting. It's been in a beta BIOS for months.
Re: (Score:2)
Re: (Score:2)
If you are not a hardcore tuner you can back off the PL2 value and disable MCE to make sure that PL values are enforced. 180W should be safe.
If you are a tuner, you can tweak voltages and clocks in XTU or use the UEFI directly.
Updated (Score:2)
Thanks for the tips. I disabled "Enhanced Turbo" and capped the maximum power draw from ~4000W (WTF?) to 235W, the max recommended for my CPU.
CEP is disabled by default in my UEFI settings now, so I turned that back on, too. I'll keep an eye on temperatures to see if anything goes screwy.
Re: (Score:2)
Just an FYI, but the way Intel CPUs are supposed to work when MCE (which is probably named Enhanced Turbo in your UEFI menu but without knowing your board, I can't say for sure) is disabled, is that it attempts to boost up to the PL2 value for a specific amount of time, after which point it backs off clocks and volts until power draw reaches the PL1 value. The amount of time it can stay at PL2 is controlled by a time value known as tau. Note that the latest Intel chips (since Alder Lake if I recall correct
Re: (Score:2)
Related, not relates. Also I think autocorrect truncated TVB to TV once bleh.
Gamer retail PC market is absurd (Score:2)
silly statement (Score:2)
Intel is not competitive (Score:1)
They hang on with AMD since some time only by consuming wildly more than PL1, and some way over PL2 if cooling allows.
14900K can pull 400W(!) peak power with limits switched off, thatâ(TM)s insane!
All this, just to stay roughly competitive with AMD, whether on desktop or notebooks.
Just check tests on notebookcheck.net, where they donâ(TM)t just run one loop of the test where Intel might look good, but loop them to see how the performance is maintained. AMD usually keep the initial scores, because
C-States? (Score:1)