Some Core I7 5960X + X99 Motherboards Mysteriously Burning Up 102
An anonymous reader writes "Intel's Haswell-E Eight-Core CPU and X99 motherboards just debuted but it looks like there may be some early adoption troubles leading to the new, ultra-expensive X99 motherboards and processors burning up. Phoronix first ran a story about their X99 motherboard having a small flame and smoke when powering up for the first time and then Legit Reviews also ran an article about their motherboard going up in smoke for reasons unknown. The RAM, X99 motherboards, and power supplies were different in these two cases. Manufacturers are now investigating and in at least the case of LR their Core i7-5960X also fried in the process."
HCF (Score:5, Funny)
Seriously don't execute the halt and catch fire instruction.
Re:HCF (Score:5, Funny)
"...there's nothing wrong with the motherboard, except that it's on fire"
Re: (Score:1)
...murray walker isnt dead
Re: (Score:1)
I'm Murray Walker and so's my wife
Re: (Score:2)
He came in my shop once and I gave him a pirate copy of Window 98.
Re: (Score:2)
Re: HCF (Score:5, Funny)
Maybe they need to include NOSMOKE.SYS in their CONFIG.SYS file.
Re: (Score:3)
Seriously don't execute the halt and catch fire instruction.
I think it can be simply counteracted with a quick Ctrl+Alt+ohfuck...
But it does make one long for the days of the three fingered salute.
Re: (Score:3)
Is that another of the NSA instructions added with RDRAND? Seriously, would not really surprise me, the NSA is after sabotaging anything these days.
I gather however that this is plain incompetence (Dunning-Kruger-Type) with regards to the voltage regulators. Switching voltage regulation is really hard to do right unless you over-engineer seriously. You can get all sorts of bizarre effects, including a puff of smoke.
Re:HCF (Score:5, Insightful)
I gather however that this is plain incompetence (Dunning-Kruger-Type) with regards to the voltage regulators. Switching voltage regulation is really hard to do right unless you over-engineer seriously. You can get all sorts of bizarre effects, including a puff of smoke.
I appreciate the irony of you mentioning the Dunning-Kruger syndrome with your statement. Switching voltage regulation has been around for over 30 years and isn't much of a mystery. Since the early motherboards started reducing voltages from 5v down to 3.3v (and below), every motherboard has had on-board voltage regulation. It's hard to believe that something as fundamental as a switching regulator would suddenly exceed the engineering skill of the motherboard designers.
~~
Re: (Score:2)
Re: (Score:2)
Really, you know nothing about this issue, the irony is entirely imagined on your side.
Switching power regulator design is among the hardest things in power electronics. It has gotten easier, but even well-known designs based on manufacturer application notes (and that is the extent to which most designers have "mastered" anything here) can literally blow up in your face if you get something wrong or the situation is not quite as expected. There is noting "fundamental" here. You need a specialized design fo
Re: (Score:2)
An engineer for Intel mentioned once that the CPU power regulators now, although low voltage, pass enough amps to do arc welding. Just think about that for a second.
Re: (Score:2)
"AMD Really Is On Fire This Generation!" Put that on the cover of the mag.
Early adopters (Score:1)
Adopt a fire extinguisher
Re: (Score:3)
Same thing happened to me with AM2 (Score:5, Interesting)
Buck feta (Score:1, Offtopic)
Why in the blazes the inter-Slashdot link leads to the Beta travesty? If I, due to some mental malady, would prefer Beta, I'd set it as my default.
Why does even Dice keep Beta afloat after it failed? I seriously hope the plans to make it the main -- and only -- interface are gone. Oh well, there's still Soylent...
Re: (Score:2, Troll)
Looks like... (Score:4, Informative)
...a failure to contain the magic smoke.
Easy to repair (Score:5, Informative)
All you need is this little kit [sparkfun.com].
NSA (Score:2)
Re: (Score:2)
Finally, it's been around for a while [intrawebnet.com]
Well, maybe not a single chip but the concept anyways.
Re: (Score:2)
Funny as that sounds at first glance, it actually makes a lot of sense. In an ordinary mainboard, the CPU cannot directly influence the voltage regulators, but in these CPUs, it can, and hence self-destruct has become possible. After the NSA transparently sabotaged the RDRAND design (the design is insecure, individual CPUs may or may not be sabotaged, but the design basically makes it impossible to tell), it would not surprise me one bit if they actually had Intel add a self-destruct as well. We really need
Re: (Score:2)
Ah, sorry. These are not the CPUs with integrated voltage regulators. Still possible but harder to add this type of self-destruct.
Not just one mobo (Score:5, Informative)
Re:Not just one mobo (Score:5, Interesting)
Does Intel have a reference design board for this? Also, how close are the VRMs to the chips they're regulating?
I once worked at a company that had a reference board with 3 FPGAs with 3 VRMs near the FPGAs. When designing their own board, the company reduced this to one VRM for all 3 FPGAs and put the VRM on the opposite side of the board. It took nine months to realize that this caused the FPGAs to reset during heavy logic switching because the single VRM + the greater length of the traces meant that the VRM couldn't keep up with the demand.
Re:Not just one mobo (Score:5, Informative)
It took nine months to realize that this caused the FPGAs to reset during heavy logic switching because the single VRM + the greater length of the traces meant that the VRM couldn't keep up with the demand.
FPGAs use synchronous logic, so they pull power in spikes as the logic switches. If it took 9 months to realize there was a problem, you can probably make some small modifications to get it working reliably. Make sure the leads from the VRM are as fat as possible, preferably have it feed into full ground and power layers, and make sure no other traces are splitting those planes. Clock all three FPGAs from the same xtal, and use a delay gate or tune the length of the traces so the signal is skewed enough that the power spikes from each FPGA are not hitting simultaneously. Add plenty of decoupling caps on every power and ground pin. Make sure the caps have leads that are fat and short. It is better to have a physically small cap (0201 or 01005) in close than a bigger one further out. Good luck.
Re:Not just one mobo (Score:5, Insightful)
It's like Penthouse Forum for nerds.
Re: (Score:3)
It's like Penthouse Forum for nerds.
Not quite - It didn't start with "I never believed that this would happen to me..."
Re: (Score:2)
You're quite right about all that you mentioned. My example was from a 2004 timeframe.
I can't recall if the faulty design cut back on bypassing/deglitching as well as the VRMs, but my guess would be yes.
Afterwards the engineer [that caused the problem in the first place] added back all bypassing/deglitching, shorter leads, VRMs, etc. that got elided from the [then] new design. The engineer was definitely in the dog house for this: for creating the problem and taking so long to diagnose it.
Personally, bas
Re: (Score:2)
The problem is not the fix - once you know the problem is power, it's trivial to fix.
The problem is identifying the root cause. Power problems are highly subtle - and usually very intermittent. The FPGAs may crash under heavy load, but it's one of the "phase of moon" bugs because you can feed in the same test patterns that crash it and it'll work the next time around.
And bugs that are impossible to replicate are the hardest ones to fix - especially if it's a new board that requires a new change to the RTL s
Re: (Score:2)
The problem is identifying the root cause. Power problems are highly subtle - and usually very intermittent.
Power problems are also very common. If you have intermittent failures, it should be the first place you look. They are also easy to diagnose: If you have a failure once a week, then remove some decoupling caps. Now it fails every hour or so. Remove a few more caps, and now it fails in minutes or seconds. Once you are sure it is a power problem, it is straightforward to remedy. Add more capacitance. Check your ground and power layers. etc.
Re: (Score:2)
Thanks for what you just said. Dead on and I couldn't agree more.
In our case, we were a small [startup] company, so we didn't have the resources to be second guessing each other. I was doing the device drivers, but I'm also 50% EE. When we found out what had happened, we were struck speechless that the first thing to check [IMO, yours, and the opinion of some of the other engineers] wasn't checked. Sigh.
Re: (Score:2)
The problem is not the fix - once you know the problem is power, it's trivial to fix.
ShanghaiBill is correct. Power is the first thing to check/suspect. In our case, the other engineering team members assumed the lead engineer had checked this--because it is so fundamental. He hadn't. He was almost fired for this.
The problem is identifying the root cause. Power problems are highly subtle - and usually very intermittent. The FPGAs may crash under heavy load, but it's one of the "phase of moon" bugs because you can feed in the same test patterns that crash it and it'll work the next time around.
We had no problem generating test vectors that caused the problem to occur once per hour.
And bugs that are impossible to replicate are the hardest ones to fix - especially if it's a new board that requires a new change to the RTL so you're not exactly sure if it's a hardware or software problem. Or even a compiler problem (since half the issues can easily be caused by bugs in the compiler).
We were quite confident that it was a hardware problem because both boards were 100% compatible software [device driver]-wise. My device drivers also would log all access to the board in r
Re: (Score:2)
That's simple stuff to diagnose... 9 months? Is there an engineer somewhere on staff or are you all just designers?
That question was posed at the time. And not calmly or so politely ;-)
Re: (Score:3)
The real problem would have been inadequate bypassing at the FPGA. From the point of view of high-speed logic, power comes from capacitors, not voltage regulators.
Re: (Score:3)
Since nobody reads TFA, Phoronix killed an MSI X99S, and LR lost an Asus X99 Deluxe. It was also different RAM (Corsair vs G.Skill).
However, both reported the burn was near the VRMs (Phoronix also reported a second event near the northbridge). The two mobos might be using identical parts for that, but I was unable to find out for sure.
I've had 7 Asus motherboards burn up in the past 4yrs. 2 actually caught fire. So that's no suprise to me, Asus is on my banned list.
MSI, however, has been nothing but good to me. They don't generally have the fastest or most feature rich boards available, but reliabilities been their strong suit over the years.
Re: (Score:1)
Re: (Score:3)
I kind of do my own non-profit buisness of building computers for everyone I know or am related to. So I've got a small business account with newegg and do about $25k in computers a year. Asus was my board of choice for years, but about 3yrs ago they just went to shit. I've no idea why but suddenly I had massive failures, massive compatibility issues, etc... When a computer I build actually catches fire, that worries me. Asus was decent about the RMAs... which actually worried me more. A MB manufacturer wil
Re:Not just one mobo (Score:4, Funny)
OK, no Asus or Gigabyte. I'm gonna build a new game rig. Which companies should I use? I've had good luck with Asus motherboards, but I only make a new computer every 3-4 years or so.
Personally, I'm surprised every system I assemble doesn't burst into flames, but that's only because I'm not really expert at these things. I hold my breath whenever I have to plug a CPU into a motherboard or slop that silver goop on top of one when I'm attaching a cooler. Once many years ago, I attached a motherboard without putting in those little round standoffs onto the case and it just sort of went "zzzt!" and then smelled like a vacuum cleaner when the belt burns. I took it back to MicroCenter and wept and moaned and they actually gave me a new one. Since then, I make sure to keep a fire extinguisher and a pint of vodka on hand when I build a system. The vodka is to keep my hands from shaking.
I know I should just go with one of the outfits on the internet that assembles gaming PCs, but I'll probably end up doing the next one myself.
Comment removed (Score:5, Insightful)
Re: (Score:2)
Yes, I've learned that the hard way, too. When I built my current system, I spent almost as much money (and time reading reviews) on the power supply as on the video card. It's more than I need but it has a longer warranty than a new car.
Re: (Score:3)
You can't just dig around in the RMA'd parts bin and ship some other broken piece of crap back to me.
Well, they obviously can so. Companies like that need to go bankrupt.
Re: (Score:2)
Or you need better warranty laws: the time they need to replace it doesn't count, each replacement (attempt) increases warranty by 6 months, if they fail 3 times they have to refund the money. That's e.g. (more or less) the rules in Germany.
One company I know will attempt to repair three times. If it still fails they replace, but the replacement part starts with a new warranty. I think that is the way it should be. The warranty should be on the part, so any replacement would effectively reset the effective purchase date.
That may get expensive for some companies, but maybe they should be rethinking their business model?
Re: (Score:1)
I kind of do my own non-profit buisness of building computers for everyone I know or am related to. So I've got a small business account with newegg and do about $25k in computers a year. Asus was my board of choice for years, but about 3yrs ago they just went to shit. I've no idea why but suddenly I had massive failures, massive compatibility issues, etc... When a computer I build actually catches fire, that worries me. Asus was decent about the RMAs... which actually worried me more. A MB manufacturer will rarely take a return with scorch marks on it unless they know there's an issue. When the RMA boards I got back from them started blowing caps as well, I knew something was terribly wrong.
Also on my banned list: Gigabyte - I had several Gigabyte MB and Gigabyte Video cards. They would not work with each other and Gigabyte claimed it was a capability issue and not their problem, despite having put their names on both the card and the board! This was purely a customer service issue, they should have shipped me a different card to make things right.
Zotac - For 2yrs I shipped the same video card back to them over and over again. They just kept replacing it with defective cards. Some came to me dirty, or with blown components. You can't just dig around in the RMA'd parts bin and ship some other broken piece of crap back to me. I'm currently awaiting about the 4th RMA on that card and my warranty will run out. At least they're paying for the shipping.
Anyways, I'm done building computers for people. Components are just too unreliable now. I don't need to be spending half my life in the UPS shipping office.
I've found msi make the list too, lot of others find the same. Seems common across wide range of experiences of rig builders and no guaranteed reliable manufacturers now IMO. I just use handful of UK stores with great support whop replace no questions asked for free and honour the warranties for products; the 3 main places I use cover everything without specifics for 12month but do 3 to 5 year on certain products. I always always use psu's that are reliable and tested by good sources. Takes a week of readin
Re: (Score:3)
> They just kept replacing it with defective cards.
I've seen a few companies do this over the years. They just keep sending defective parts until you give up in frustration or they go out of business.
Re: (Score:2)
That's not my experience. I've always wondered why Asus has been held in such high regard when I've found their stuff be to be pretty much crap, dating back to at least the Socket A days. Not just motherboards too, as their video cards are just as flaky and die just as quickly, and don't buy their laptops either unless you need a paperweight. Heck, I'd buy ECS before buying Asus. The quality may not be any better but I'd at least save myself some money.
Re: (Score:2)
Houston ... (Score:5, Funny)
... We've had a main B bus undervolt
Re:Houston ... (Score:4, Funny)
Re: (Score:1)
Dramatic Music Intensifies
Re:Houston ... (Score:4, Insightful)
"The system came up, hung for a very short time and then powered off with a audible click of the Corsair AX860i power supply. If you have ever heard the loud click of the Over Current Protection (OCP) shutting down the PSU you know exactly what click I heard. Now when I press power button on the motherboard the system clicks after being on for a split second. I unplugged all the cables on the power supply and did the built-in self-check and it passed with flying colors. I still swapped out the PSU with a backup Corsair AX860i and the same click was to be heard. and it is doing the same thing (Corsair AX860i). After clearing the CMOS, removing the memory, SSD and video card the system still would not post. At that point in time I switched to a non-digital power supply (Corsair AX1200) and it did the same thing although this time the OCP took a little longer to kick in. There was some audible crackling noises, followed by some smoke near the CPU VRM heatsink. So, the heart shattering smell of burnt electronics filled the room..."
10/10 for investigative journalism but putting more and more juice into something that is continually tripping out the power supply is not going to have a happy ending. Maybe some of the $1,400-worth of motherboard and processor may have been salvageable if he had stopped at the first warning?
If the circuit breaker pops twice on a ring main at home, do you a) replace the circuit breaker with a bigger one, b) hold it in until smoke appears from behind the wall or c) do some serious investigation and/or call an electrician before putting the power back on?
Sample size of two (Score:2)
It's not into customer hands... (Score:1)
It's *REVIEW* Boards. Even assuming the reviewers bought them off the shelves, having two fail spectacularly that were different brands and memory, but the same CPU/Chipset raises some eyebrows.
Assuming the failures were similiar and the non-discrete components along the failure paths were not from the same manufacturer, it would sound like either a design flaw in the reference implementation, or manufacturing defect in either the cpu or chipset.
It's not into customer hands... (Score:1)
the board in the Phoronix article wasn't an engineering sample / review board but the author mentioned about buying the board from NewEgg...
Re: (Score:2)
Actually, the chip-set is pretty irrelevant. The damage observed would indicate that the CPU draws a lot more vurrent, maybe in short spikes, than what the voltage regulators can handle. It may also be that the CPU causes instabilities. Switching power regulators can be tricky, and they certainly are at the voltages (very low) and currents (very high) we are talking about here.
Re: (Score:2)
Switching power regulators can be tricky, and they certainly are at the voltages (very low) and currents (very high) we are talking about here.
Citation?
~~
Re: (Score:2)
Experience cannot be gotten from citations. Experience is something you have to acquire yourself. For this case here, even an introductory text will warn you, so stop being lazy.
Just oil (Score:1)
did they take the fan off? (Score:1)
https://www.youtube.com/watch?... [youtube.com]
Voltage regulator? (Score:3)
From the photos and the write-ups, it looks like a voltage regulator is failing. So, maybe a spec in the data sheet is wrong (for reasons from typo to ooops, we didn't compute that rating correctly...) or maybe a parts vendor for that regulator had a bad-batch day. It happens. Years ago I was involved in one of the latter... "Which date codes do you want us to pull from the parts crib again? I think we have about $2 million of the bad ones...." -- at least that time I was on the customer side, which has much less impact on your sleep schedule.
Before transistors... (Score:2)
In the old days, before computers went solid state, smoking on startup was often put down to worn valve-guides.
Damn... I think Americans call them "tubes" -- in which case the joke doesn't work :-(
So much for the smoke test ... (Score:2)
.. looks like consumers are on the bleeding edge.
Re: (Score:2)
I take my hat off to the early adopters, the ones on the bleeding edge of anything new that comes out. But, over the years I've learned that if anyone is going to get hurt with the next new thing, it is the early adopters. Me, I wait a while. But I still thank the early adopters that take the risk the rest of us are too gutless to join in.
Thank you, all.
I'm itching to buy a X99 PC build but waiting for exactly this reason. Anybody happen to have any insights on "normal" timing for revised motherboards (rev A/B/C etc.) -- how long it usually takes after launch of a new platform like this before the first minor/major revisions of the motherboards are out?
Re: (Score:2)
> the rest of us are too smart to join in.
FTFY. :-)
Re: (Score:1)
Re: (Score:2)
That too !
New thermal insulation? (Score:2)
Perhaps (Score:3)
It was a bad motivator?
AMD VRMs have problems too (Score:1)
Anyone who has a 6 or 8 core AMD FX chip will know the troubles with motherboard makers and VRM quality. If you plan to really use those chips then you better have a board with quality VRMs and proper cooling. If you use water cooling, then no airflow is going over the VRM heatsink. If you use a side to side air cooler, the situation is the same.
Overclock.net has had people complain about this very issue for years.
http://www.overclock.net/t/943109/about-vrms-mosfets-motherboard-safety-with-125w-tdp-proce