Software To Diagnose Faulty PC Hardware? 274
Etylowy writes "Over the years I have repaired my own PC and those belonging to family and friends many, many times. While in most cases it turned out to be restoring a system after malware/the user/Windows made a mess, or simple cases of 'follow the smell of smoke and molten plastic,' there were some nasty ones where the computer mostly works. By 'mostly,' I mean: you can boot it up, it might even work for a while, but will crash way too often to blame it all on Microsoft — what do you do then? Once you strip it of any extra hardware (which, with today's motherboards that have pretty much everything integrated, might not be an option) you are left with the CPU, motherboard, graphics card, RAM and HDD. You can test the HDD, you can run memtest86+ to check the RAM, but how do you go about testing the CPU, motherboard and graphics card trio to find which is to blame? Replacing them one by one isn't really an option. Do you know of any software that would help the way memtest helps with RAM?"
OCCT (Score:5, Interesting)
It will stress your RAM, CPU, and GPU or all at once with pretty temperature and utilization graphs (for Windows only): http://www.ocbase.com/perestroika_en/ [ocbase.com]
Overheat (Score:5, Informative)
That's a marginal idea at best, but a common one.
While the technique of blasting a processing unit to see how it behaves at maximum temperature will sometimes find a faulty unit, many faults are not temperature related, and will not show up on this test. It's fine that you brought it up here, but something that both heats the CPU/GPU and tries to test as many pathways / as much of the instruction set as possible would be far more useful. (cf memtest86+ for RAM)
PSU (Score:5, Informative)
Oh, and don't forget to check the PSU. When it acts up, it will often appear to be a hardware fault somewhere else in the machine. (often RAM, but can be MB, CPU, GPU...)
This certainly doesn't answer the posters question, but it is related and important.
Re:PSU (Score:5, Informative)
I have seen some of the most ridiculous problems that were PSU related. Serial mouse not working, VGA card outputting in B&W, slow and or intermittent performance, HD's that constantly reset (and sound like click of death in the process), new memory being blown, known good memory acting like bad memory, CD-R's that can't burn (or finish burning successfully), software modems that couldn't go off hook, AGP cards crashing, PCI cards crashing, VLB SCSI cards not working at all.
The list really just goes on and on and on. Software to diagnose faulty PC hardware? Sorry, no thanks. I had tried all manner of diagnostic and test software over the years. Some worked some of the time. (mem tests and HD scanners), the rest were borderline use-less pieces of crap. Not only that, but because of faulty PSU's (usually overloaded, or just old, or overheating, etc etc etc) I have seen those same programs misdiagnose just about everything.
Aside from simple sensor reading and verification (of code, built in HW diagnostics, etc) I do no trust 'software based' hardware diagnosis, especially on a PC.
YMMV.
Re:PSU (Score:5, Insightful)
Check supply voltages first.
There's a really fancy test program to do this... it's called a digital multimeter, and it's a piece of hardware with two probes.
You touch one probe to ground, and then use the other to check all the leads going into MB for supply voltage.
For desktops that is.
For servers, the power supplies are generally smart modular units, and you check their voltage outputs in the BIOS screens, or using remote management via BMC: IPMI, iLO, Drac, or ALOM
Re: (Score:2)
Mod parent up. A proper multimeter and a power supply voltage chart is the skeptic's answer to all kinds of hardware voodoo.
Re:PSU (Score:5, Insightful)
While that is good "Bad or Maybe" test, most PSU problems are transient over- or under-voltage conditions, which a DMM is not going to reveal.
And there are testers that will measure all (or most) of the voltages produced at once - you jut plug the atx cable into the device, and many of them have a pass-through, so you can test the PSU under load. I'd look for one that could flag a transient problem, if it exists.
Mind you, since writing the above I have looked around for one, and have failed! They all are pretty simple devices that do not detect transients, I could find no pass-through devices, and they all test under very anemic loads. All told, I am not impressed by any of them.
Re: (Score:3, Insightful)
Yes... but unless you are doing this professionally, or going way out of the way to build a full blown test rig and load bank [tomshardware.com] yourself, the gear required to fully test a PSU anywhere near max load is not worth it to the average person, a spot check with a DMM on the bench or in the PC (if the PC is working) is a good tradeoff, and if there is any question, try replacing the PSU.
Versus buying a $100,000 Sunmoon or Chroma tester. Or bench Oscilloscope + DC Load generator + Variable AC output gear (for
Re: (Score:2)
This can also be done very quickly by using a ATX power supply tester like this one [coolmaxusa.com]. It has a LCD screen which shows the voltage for IIRC every connector on your power supply. In use image here: http://www.ocia.net/fullsize.php?filename=32_9.jpg [ocia.net]
Mod up - Everyone buy one of these (Score:2)
If you work on PCs even infrequently this is a must have tool. Yea a multimeter is great but a) you need to know how to use it and b) you can push the probe into the wrong place and make a mess of things.
With hardware its usually bad psu, then bad memory, then bad caps.
Re:Mod up - Everyone buy one of these (Score:5, Funny)
With hardware its usually bad psu, then bad memory, then bad caps.
Then bad karma, then bad mojo.
Re: (Score:2)
Re: (Score:2)
Agreed, speaking as a sufferer of now 3 Enermax PSUs with "right, I'm going to die now kthxbai" syndrome. And these are supposedly *good* PSUs. In my case 2 identical systems did this on the same day. Put PSU2 into PC1 and it worked for an extra month - albeit risky - but some motherboards seem to have a greater tolerance of dying PSUs than others.
But the really crap PSUs that you get bundled with a case etc are most likely output all kinds of crap and cause random weirdness and crashing.
The unfortunate
Re: (Score:3, Informative)
Re: (Score:2)
On your sig, English dictionaries have a lot of definitions of free, and as I understand it, none that exactly match free as in free software. That is why people who need to be precise say gratis and libre. You are playing nominalist.
Not all your fault. The current English dictionaries are probably the result of the ongoing long-term cultural deterioration. I would expect that really old English dictionaries have the meaning.
Re: (Score:2)
Nostalgia has ruined your brain.
I do not go to movies, watch TV or do DVD's. I figure I have better things to do. So I am not sure what the current offerings are like.
You admit you don't know what the hell you're talking about, so why do you keep talking?
Re: (Score:2)
Greetings and Salutations...
Thanks to this new fangled thing called "Advertising" it is hard to avoid getting smacked in the face with information about upcoming entertainment offerings. It is also very easy to tell from these ads that the movie being pushed is probably mindless pap that has nothing to offer but tired retreads of ancient plots, and, characters being forced into making the worst possible decisions in order to keep the weak plot moving forwards to th
Re: (Score:2)
I did go on to comment on the prevelence of sequels. This is hard to miss even if you do not go to movies. Anyway, I was just commenting on the irrelevance of the straw men that were set up.
I notice you did not comment on my example, which was in physics. I think you have no future, know it in your heart, and cannot face it. So you attack someone who gives you a partial explanation of why. Or do you believe the media's reassurance on the economy? If you do not, how bad do you think it will get? What
Re: (Score:2)
I did go on to comment on the prevelence of sequels. This is hard to miss even if you do not go to movies.
And?
Without any proof that sequels are worse movies than average, that means nothing. (Now, given, sequels very well might be nearly-universally worst than average movies, but you sure as hell didn't show it.)
I notice you did not comment on my example, which was in physics.
That's because I'm not a physicist. Are you?
In my field, web development and usability, there has been AMAZING change in the last 10
Re: (Score:2)
"For instance, there has not been a new fundamental science discovery in sixty years."
Memristors and nanotechnology don't count? What about quantum entanglement having the potential to transmit information 10,000X the speed of light?
Re: (Score:2)
Good question. The photoelectric effect was explained about 1905 by Einstein as involving the quantitization of energy. This is where he made his reputation in the field. Skipping a lot of important people, in 1921 we had Schrodinger's equation. Quantum electrodynamics (QED) came along in the 1940's. Quantum entanglement comes out of these theories.
Memristors and nanotechnologies do not count because they are not fundamental. They often reveal new phenomena, but for an explanation people go back to QED
Re: (Score:3, Informative)
Use a can of compressed air to purge out any accumulated dust. Less dust means a cooler box, which may just bring the unit back within whatever temperature (or, by extension, power) tolerance it is pushing the envelope on. Another technique is to wiggle every cable and connector and slotted card, just to make sure nothing has come loose. Check to make sure all the fans are running whilst powered on.
Re: (Score:2)
I also praise the authors of "stress" a handy application for linux which can perform non-destructive stress to hard disks as well as provide load on the processor and memory.
For memory I really do like memtester, but I wish it was a bit less verbose. (For a user space app it's not bad)
Re:OCCT (Score:4, Informative)
Did you actually install it? (or are you a typical /. reader?) It has a "GPU" option for stress testing your graphics card if you have the latest DirectX updates installed.
Re: (Score:3, Informative)
many people overclock their GPUs too, so it would make sense that a tool for Overclocking stability tool would stress that as well.
Re: (Score:2)
A good point! A tester can tell you what it tested when an error occurred, But that is not always the cause. An error while testing the memory could be caused by the cpu's cache that the data passed through, or the motherboard circuitry. Same thing for the GPU - was it an error in the GPU, or was it an error getting the data to or from the GPU, or a cpu error while analyzing it?
All to often, you just can't tell. Welcome to the world of PC repair. Add a fault that doesn't show up when the system is warm, an
what about PTS? (Score:2, Insightful)
Replace the integrated part (Score:3, Informative)
Re:Replace the integrated part (Score:4, Insightful)
Even when they do, it's usually a sign the rest of the board is on it's way out too. A device on the board not functioning can mean a number of things (MB controllers acting up, visible/non-visible corrosion in the board, blown capacitors, etc), so you can be up for a lot of weird behaviour from the board that you can't pin down.
To be honest, relying purely on a test suite to tell you what's broken will lead to disaster. Only through experience do you get the pointers toward what is actually faulty. Add to this that true diagnosis only comes from swapping out parts, and, well, test suites don't look at all like a viable option.
When I am repairing hardware about the only suite I use is memtest86+ and a decent live linux distro. You can usually pick devices that have failed with lspci, however this is not always correct. It all goes back to having test hardware & the knowledge of what certain behaviours in systems are caused by certain faults. After 15 years of working in IT with both hardware & software faults, there's only so much you can do with limited or no test hardware. Most of the time when you're diagnosing hardware faults on the phone it's an educated guess at best, the only time you truly get a decent diagnosis is when you have the machine with you and can swap parts out. Hell, we don't even use the Dell diagnostics at work due to their inability to give decent results on anything other than RAM.
Re: (Score:2)
There is a lot of truth in your post. I think you're mostly right. I also think you might be holding a one-sided argument through much of your post.
Even when they do, it's usually a sign the rest of the board is on it's way out too.
It can be. You have to wonder, why did it fail? Was there a surge? Is the PSU dying and stressing things? Was that particular integrated chip part of a bad batch? Did it get an ESD on installation? Has a controller failed? In the last case, you will usually see additional symptoms. Most integrated devices are hooked into the PCI bus as if they were pl
Re: (Score:2)
"In short: It is possible to diagnose a computer entirely from software."
Tell me a piece of software that'll expose a dying capacitor, please?
Quicktech (Score:2)
Eurosoft PC Check (Score:5, Informative)
Here's its web page : http://www.eurosoft-uk.com/pc_check.htm [eurosoft-uk.com]
In any case, I recommend plugging the ATX cable into a power supply tester that presents a non-trivial load as a first step in diagnosing any PC. You'd be surprised in what ways the problems caused by out-of-spec voltages can be manifested.
jdb2
Re: (Score:2, Informative)
I second this. I've had 2 or 3 PCs now that have begun acting very strangely only to discover that the real problem was the power supply. Replace it and the PC acts fine again.
Re: (Score:2, Informative)
Same here. I've consistently had problems with a PC to discover years later that the PSU was defective (it actually blew up). I got a 450W PSU and all the bizarre symptoms have vanished.
Re:Eurosoft PC Check (Score:5, Informative)
Every power supply which I've found failed was visibly broken once you opened it up, and it was always the capacitors. No Exceptions - capacitors had sprayed gunk all over, their Aluminium cans had popped off the bases, etc. Typical electrolytic fluid is white-ish, but once it bakes dry will scorch, and so gradually turn reddish brown. Many capacitors have grooves scored into the tops which form sort of impromptu blow out panels, and often you will see them bulging, with traces of fluid escaping from these grooves where they are actually splitting open, or scorched fluid forming a red-brown powdery residue outlining them. The grooves are usually in either an X (or Plus) or a sort of K shape. The PSUs are often still working (somewhat) at that point, and often, the PSU may be putting out nominally correct voltages when cool but deviating when it heats up. I had one client's PC that made a loud bang twice over a period of about a week, but the PC didn't really start acting funny until the third bang. Opening the PSU revealed three small caps that had blown completely off the board. It had probably kept running with no obvious symptoms through the first two.
Of course, only a trained pro with good tools should ever examine the inside of a power supply while live. But, if you are willing to unplug one and take it out of the PC and let it sit overnight, just to make sure the larger capacitors have fully drained, I recommend examining them. Yes, that voids the warranty if you aren't a pro, but if you were going to junk it and buy a new one anyway, so what? But before you open one, read this:
DON"T EVER OPEN A PLUGGED IN POWER SUPPLY. IF THIS DOESN"T APPLY TO YOU YOU ALREADY HAVE AN ELECTRICIANS LICENCE, A EE DEGREE, OR SIMILAR. DON"T OPEN A POWER SUPPLY UNLESS YOU KNOW THE LARGE CAPACITORS INSIDE ARE DISCHARGED - THEY CAN MAKE YOUR ARM MUSCLES CONTRACT HARD ENOUGH TO BREAK YOUR BONES. GIVE THEM AT LEAST AN HOUR TO RUN DOWN, THEN USE AN INSULATED TOOL TO CROSS THE PLUG PRONGS BEFORE YOU OPEN THE CASE.
Split caps or scorched ones will confirm you are right in your guess that it's the PSU. While you're at it, if you think the problem is the motherboard, check for capacitor damage there too, as it's not all that uncommon for that to be why a mainboard fails. Cheap electrolytics are probably responsible for more than half of all consumer electronics failures, they are by far the most likely source of intermittent failures, ones that come and go with temperature, or glitches that only partly disable something, and they are detectable.
Re: (Score:3, Informative)
Don't trust the caps with the 'X' pattern. The 'K' pattern is more reliable.
Ask any of the many who had Dell machines from about 2000-2004. And HP/Compaq. And Acer. Not so much IBM/Lenovo. I have no reports for Gateway.
Also affected ASUS, MSI, AOpen, Gigabyte motherboards, pretty much all brands.
For a period of time, there werw substandard caps being used, but the maker either faked the testing or used different component parts in production runs than in certification. If you got stung by these, you a
Re: (Score:3, Informative)
YOU SHOULD NEVER USE CAPS LIKE THIS AND NEVER SUGGEST SOMEONE BRIDGE COMPONENTS WITH A SCREW DRIVER.
I'm getting a bit tired of replying to all of the bad advice I see flying around. However, never discharge caps by bridge the connectors (even if the tool is insulated). A large enough power source can cause some serious problems.
The proper way to handle this is to terminate the load into a ground source capable of dissipating the load. Earth ground will suffice, but don't dump a crap ton of current into the
random thoughts (Score:3, Insightful)
self-checking programs like Prime95 can be useful to test the computer more generally (if you've verified with memtest a failure here basically means cpu/chipset at fault).
Other things I've tried before have been (if the motherboard allows) things like significantly underclocking sections of the motherboard/processor, if an specific underclock fixes the problem you just significatnly narrowed down the list of possible failures.
there are similar programs to memtest that will check a GPUs output conforms to what it should, but if you just have random-crashy-badness that can be a pain to diagnose. Sometimes things like just running without graphics drivers for a while can help spot those problems, if the computer no longer crashes you can look a bit further away from the graphics card as most of it's capabilities won't be used.
Just replace it. (Score:2, Informative)
Repairing hardware makes no sense anymore. Just swap in a new machine from the pool, so the user will be happy again, call the manufacturer to send someone onsite to replace the system board, redeploy the image, and put the machine back into the pool.
At home, i usually replace the machine before it has a chance to get old and flaky.
Re: (Score:2)
"Over the years I have repaired my own, family, and friends' PCs many, many times".
I know RTFA is too much to ask on other articles but RTFS's first sentence on askslashdot can't be THAT much ... can it?
Re: (Score:2)
Re: (Score:2)
You still don't seem to get it. Friends and family only rarely ask you to fix a machine that's still under warranty. More often you wind up diagnosing / replacing the broken part yourself, and sending them on their way.
(often the 1year warrantied hard drive that gives out at 13 months; people aren't going to replace their computer every year because of that.)
Re: (Score:3, Interesting)
Re: (Score:2)
Yea, and basic Dell Precision Workstation T3400s are ~$650... Even less reason to deal with them out of warranty.
How to test? (Score:3, Insightful)
Well... typically you find the fault by using an application which stresses one of those components far more than any other and then seeing if the failure condition you're observing occurs more often. This is just basic troubleshooting, it's not even specific to computers.
Re: (Score:2)
Re: (Score:2)
"For RAM it's fairly easy - 2-3 different data manipulation methods used by memtest and you know if there is an issue."
This isn't entirely accurate. Memtest won't tell you the operating speed of the memory module. It's quite common to have a memory module pass every Memtest86+ test and then go to a hardware-based RAM checker only to find out the speed starts at the proper MHz range then drops by half or more - bad RAM chip.
Memtest will not catch that and sometimes that is the ONLY way to diagnose a faulty R
Preventative Medicine - get a UPS (Score:5, Informative)
Most home computer hardware failures come from "brownouts".
If you notice that your lights dim a little bit when your fridge compressor or AirCon comes on, that is a recipe for a computer failure. Spend $50 get a UPS [amazon.com]
Btw, i noticed that my linksys wifi router was also extremely sensitive to brownouts. It would get funked up and need to be power cycled. Plug it into a UPS , no more wifi problems either.
I learned this the hard way when i moved to an old building in the east village of NYC and had 3 motherboards/cpu fail within a 3 month period.
Re: (Score:2, Insightful)
That's not an Online UPS, so it won't protect against all grid issues. And Online UPS are expensive and noisy.
Re: (Score:2, Interesting)
If you notice that your lights dim a little bit when your fridge compressor or AirCon comes on, that is a recipe for a computer failure.
Why? Doesn't the computer's PSU have enough juice in it to survive a quick dip in voltage? Besides, almost all PSUs are rated ~90-260V, so I always assumed if it dips from 230V, it won't matter.
Occasionally my lights dim but I don't seem to have had problems. I'm still waiting for my decade-old P3 to die so it can be replaced by an Atom board, but the darn thing keeps on ru
Re:Preventative Medicine - get a UPS (Score:4, Informative)
Most home computer hardware failures come from "brownouts".
If you notice that your lights dim a little bit when your fridge compressor or AirCon comes on, that is a recipe for a computer failure. Spend $50 get a UPS [amazon.com]
Btw, i noticed that my linksys wifi router was also extremely sensitive to brownouts. It would get funked up and need to be power cycled. Plug it into a UPS , no more wifi problems either.
I learned this the hard way when i moved to an old building in the east village of NYC and had 3 motherboards/cpu fail within a 3 month period.
What you really need in the case you describe is a good line conditioner. I didn't look at the 'UPS' you mentioned, but many in that price range are not a true UPS and will still allow for under voltage to occur, albeit for a shorter period if you're lucky. .
Re: (Score:2)
I'm not so sure about that (Score:2)
You have a source to back that up? Because if not, I'm calling shenanigans. That seems real unlikely for a number of reasons:
1) This would be a recipe for lawsuits. After all, this situation of momentary power drops happens ALL the time on all kinds of circuits. If computers weren't able to handle it, that'd be a great way to get sued. With consumer devices you don't get to say "Oh this thing is super sensitive you have to take all kinds of measures to protect it." You device is expected to deal with common
Re: (Score:2)
I concede that i was incorrect to place the blame on the brownouts specifically. I should have said home PC hardware failures are caused mostly by electrical problems.. I mention the brownouts because that is something visible (as opposed to the spikes.)
And getting a cheap UPS solved the problem. Specifically I got an , which was around $50. [apc.com]
If you spend $500 to $5000 on a computer (or other electronics), it is a good investment to protect it with a $50 UPS.
Re: (Score:2)
Interesting...
In my parents' house, turning on the 7kW electric shower briefly dims the lights, do they have a problem with the neutral? The house is in the UK, so the electricity supply to the building is 230V single phase (IANAE).
Re: (Score:2)
Here in The Netherlands these heaters have never been popular, natural gas is half the price, but those that I've seen were always 3-phase.
Because a heating element doesn't really have an increased start current like an electrical motor which would cause a flicker unless you mean a semi-permanent dimming of the lights when you're running your shower.
In the la
Re: (Score:2)
Although the UK allows rather unusual wiring for high Amperage circuits (the reason you have fused plugs) I doubt this heater runs on a 32 Amp fuse.
It is on a separate circuit with a 30A fuse (perhaps it is 6.5kW, I don't remember). I don't think I've ever seen a UK house with a 3-phase supply.
Here in The Netherlands these heaters have never been popular, natural gas is half the price, but those that I've seen were always 3-phase.
Unfortunately, when it comes to British housing there's far too much stuff that's cheap in the short term (like only installing a cold water pipe to the new electric shower), or inefficient because the person paying for the equipment doesn't pay the operating costs (like far too many rented properties without double glazing, high-efficiency appliances, or decent
Re: (Score:2)
Maybe. The loose neutral issue is a real, serious problem, as you've seen. It is something that I troubleshot and fixed myself in my current house, blessedly before anything expensive happened.
Brownouts can also be caused by one or both hot wires being bad (resistive) somewhere along the way. The symptoms are different from a loose neutral connection in that the lights on other circuits don't brighten at the same time as the brownout occurs.
Microscope (Score:4, Informative)
I like the Microscope products...their newest version Microscope duo boots off of a USB stick. For machines that dont boot at all they also have a diagnostic card, its basically a pci card that has an led readout that give a series of post codes that can help diagnose if its the board, a card, memory, etc. They can be found at http://www.micro2000.com/ [micro2000.com]
The handiest piece of diagnostic gear I use is actually a simple power supply tester. You would be amazed how many systems that appear to power up are actually suffering from a dead -5 or +5 rail on the powersupply. Many tend to think if the fans spinning the powersupply is ok but thats often not the case. The best part is they are cheap...around $10 for a basic one.
Re: (Score:2)
A power supply tester is mostly useless. The basic features of any modern motherboard include sensors which display the voltage readings.
A power supply tester simply identifies whether or not an unloaded voltage source is within the 5% variance. It would have to be extremely poor condition to not pass this test (sic, obviously failed and identified from the same common tools everyone has access too).
In many circumstances I find it necessary to apply load to a power supply in order to quickly identify the fa
Hiren's... (Score:4, Insightful)
Re: (Score:3, Insightful)
Hardware tester (Score:3, Informative)
Re: (Score:2)
Most hardware is 'dumb' and does not have fault latches.
This is a cost that was avoided in order to make cheap motherboards and system components.
Hardware troubleshooting is in no form about trust. It is applying a series of logical steps designed to isolate and repair failures.
SMART for dying hard drives (Score:5, Informative)
http://sourceforge.net/apps/trac/smartmontools/wiki [sourceforge.net] is great for finding out what the drives think about their own health. Things to look out for are spin-retry counts (which lead to that annoying 2-5 seconds freeze), high reallocated sector counts (never never never use chkdsk to attempt to fix a broken hard drive. With the robustness of modern journaling file systems (HFS, extN, NTFS), storage errors are almost always hardware errors. Running chkdsk stresses the drive just as it's failing and usually pushes it over the edge -- and then users complain that you can't recover their data.
Good comment )))..) ) ) ) ) ))))))) (Score:2)
prime 95 (Score:3, Interesting)
Prime 95 is a good test of CPU/RAM, as well as to see if the system remains stable under peak temperature. It's often used to burn in overclocked machines.
Swap the damn hardware (Score:4, Informative)
There is no way to tell, with software, whether your PSU, CPU, or motherboard is to blame, in the overwhelming majority of cases.
It's just idiotic to say "Replacing them one by one isn't really an option". In fact, that's by far the best option. I don't run memtest for a week to find out I have bad RAM, I take 30 seconds to swap it, and find out, for certain, in no time. PSUs are equally easy to swap, AND are the more likely component to fail, so that's the best place to start.
If you don't know whether it's CPU or the MoBo, buy a new motherboard... Vastly more likely to be the cause, and pretty damn cheap just as soon as they're no longer brand new. Of course CPUs fail, but it's likely to be obvious from a visual inspection if they've been installed wrong, or otherwise abused.
Re: (Score:2)
I agree. When I build a new system I first:
memtest86+
cpu test with something like prime95
CPU+GPU test with prime95 and then another 3D game running in the background.
If it survives that last test, then it's good. I've found overheating of my system to be the main cause of crashes. I've actually had to underclock my RAM to get it stable. If something does fail, I swap that component or add more fans and try again.
Re: (Score:2)
Unfortunately you left out one major component in this troubleshooting scenario.
Before applying any troubleshooting steps you must first create a verifiable test condition to reproduce the problem.
If the problem cannot be reliably reproduced it will be difficult to isolate the fault with physical isolation, reduction or replacement of specific components.
Before beginning on such an endeavor strive to create a scenario in which the problem can be reproduced quickly.
Waiting a week for the fault to reproduce c
prime95 (Score:2)
never heard of prime95?
it's been used for years to check stability in rigs by overclocking and gaming enthusiasts.
They even have various different "levels" of FFT tests to limit the torture tests to within CPU cache levels which tests the CPU...or more than tests the RAM, PSU, etc.
Prime95 [mersenne.org]
Serious answer: don't bother: upgrade. (Score:2)
I've done a significant amount of PC construction and reconstruction: approximately 60 from-scratch builds in 20 years. One thing that that has taught me is: do not bother to try to diagnose motherboard or CPU faults: just replace them, end of story.
Even Integrated Motherboards can be had for £40, and CPUs for £25. You can get dual-core 1.6ghz Atom Integrated-everything-including-CPU motherboards for £90.
For the amount of time and effort spent unscrewing components and testing combinatio
Re: (Score:2)
Other than that: if you cannot find any evidence of firmware upgrades to potentially fix an unreliable machine - throw out the power supply, the motherboard and the CPU, without hesitation (or get them replaced under warranty).
If you've only ruled it down to one of those 3, how will you get the companies to replace those parts under warranty?
Practical System Stressing... (Score:4, Funny)
I stress my Linux boxes by telling them that if they develop a fault I'll re-image them with Vista.
Not a single one has dared to fail on me yet.
Re: (Score:2)
Take all your consumer electronics to the movies once a year. Set them on the couch, give them a bowl of popcorn buttered with WD40, and let them watch "The Brave Little Toaster". (Popcorn is optional).
empirical testing: Compile the Linux kernel (Score:2)
gcc is an incredibly good test application. it's horrendously cpu-intensive, and it is designed to eat whatever physical memory is available. compiling c++ applications is particularly memory-intensive, but the best test of both disk and memory has to be simply to compile the linux kernel.
if you have multiple cores, you can use "make -j {number of cores + 1}" and this will test all of the CPUs, as well. if you particularly want to stress things, make that "make -j {number of cores * 2}" instead.
Re: (Score:2, Funny)
Not a perfect solution (Score:2)
A more productive diag
What separates a PC from a real computer... (Score:2)
Your average PC hardware has utterly no way to "test" it. You can sort of test RAM - to the point of identifying there is a failure somewhere in the memory. OK, if you have four DIMMs what does that mean? Well, it means you have a RAM problem somewhere.
Motherboard? Not really any sort of testing possible. There are some "pretend" diagnostic tools that will try to tell you if something fails, but what exactly does that mean? Nothing. If you have a ATAPI DVD drive and a SATA hard drive I assure you tha
Rule #1 of Diagnosing Hardware (Score:2)
1. Check the software
2. It's probably the software
3. Really, it's going to be the software
87. OK, now you should run some diagnostics
Really. The bottom line is that computers and their parts (especially non-moving ones like processors and RAM), once they're burned in and assuming you don't try to run them overclocked for twenty years without rotating them out, are pretty reliable. I can't count more than a couple instances of hardware failure post burn-in across about fifteen different home machines
Well (Score:2)
Toast and Pi and various other CPU stability test programs will let you test the CPU.
Go into system configuration with windows and turn off auto-reboot, so that if the machine blue screens, you can see what the error code is. Sometimes that will let you isolate it to graphics or the motherboard.
Ultimately, the way to find out IS to replace the components one by one. If you have several machines, or spares from an older machine, you should swap each component and run the machine until either you get a cras
Power supply (Score:3, Interesting)
You didn't mention the power supply.
In my experience, a "crashy machine" is almost always down to the PSU. Out of the dozens of "crashy machines" I've had to fix, only one was due to bad memory. The rest were *all* down to faulty power supplies, and all of those were due to capacitors that had failed.
I have an oscilloscope so I can easily test for ripple without needing to open up the power supply and look for the obvious signs (bulging capacitors, maybe ones that have leaked). We've had dozens of machines at work with supplies that have gone bad this way. Bad capacitors have been a real problem in recent years. Four years ago, it wasn't just in power supplies either - we had to return 70 machines to Hewlett-Packard under warranty after the capacitors on the motherboard began failing after 3 months of use. We've not seen anything on that scale on motherboards since, but we still have frequent problems with power supplies failing from "capacitor plague".
A machine of mine was actually killed by a sudden power supply failure - the PSU let the magic smoke out with a loud "bang", and there was the sound of stuff richocheting around the computer's case. That sound turned out to be bits of exploding chips on the motherboard. The only thing that survived that incident was the CD-ROM drive - all other components were destroyed.
It's a loaded question (Score:2)
What's the best software to change a tire on your car and find the leak?
Software can check quite a few things, but for the most part during a short time interval, digital hardware is either working or it isn't. So software performance tests may not be very good at revealing something marginal.
Beyond a few software tests and ruling some things out by substitution, it generally takes someone with some hardware troubleshooting skills, and some test equipment.
Of course test equipment starts with your senses.
Strange crashes Win 7 (Score:2)
burnintest (Score:2)
http://www.passmark.com/products/bit.htm [passmark.com]
burnintest. have used it for years. works fine. some systems which would run fine for days and then crash were driving us crazy. this software found memory, video and cpu problems. free version of version i bought only ran for 15 minutes. might be enough to find your problem. windows only though so that might be a problem.
You don't, you swap out hardware (Score:3, Informative)
Swapping out is the ONLY way.
I have systems with intermittent (heat activated) dry joints on a mobo, partly duff RAM, and partly duff (rebranded at higher clock) CPU. ONLY swapping out will find it.
HTH etc
Pop in a live cd of some other OS and give it a go (Score:2)
Did you check the logs? (Score:2)
Most crashes in windows are either hardware related or shitty drivers. Windows these days is resilient to crashing applications, but crappy
Use your eyes. (Score:2)
RAM is easy to test using basic troubleshooting techniques: Remove some of it, see if the problem recurs. Replace some of it with good spares, see if the problem recurs. Etc, so on. memtest86 also does a decent job of finding bad modules if left to run long enough, but since it runs in isolation from the rest of the computer it will not detect certain corner cases of bad RAM.
Power supplies are similarly easy: Swap it out for a known good supply, and see if the problem recurs.
I've never had a CPU fail,
UBCD (Score:2)
I've found the UBCD -- Ultimate Boot CD to be quite useful.
http://www.ultimatebootcd.com/ [ultimatebootcd.com]
It does come in handy, includes many of the necessary tools to determine HDD end of life etc.
It certainly isn't perfect, but I am amazed nobody has mentioned it yet in the discussion. Obviously real tools are on my bench, but when the poster specifically asked for software....this is the easiest and most broad spectrum solution.
Testing software (Score:2)
I recently diagnosed two desktop machines. One ended up having a bad stick of memory, with the original symptoms being a corrupted copy of Windows XP that wouldn't boot. The other a bad hard drive, the symptoms being it would hang during use randomly and even during boot.
I used Prime95 and Memtest86+ to detect the bad stick of memory. Prime95 quickly came up with a error during the stress test, and Memtest86+ also came immediately came up with errors. In the past I have since subtle errors with Memtest86+ t
Just get hardware testers (Score:2)
Half of your RAM issues wont' be able to be diagnosed with any piece of software. No RAM checking software will keep tabs on the operating speed of the RAM. Ditto with a CPU tester, there's hardware and socket adapters to help you plug in CPUs and test them with hardware.
My time spent in the hardware repair/replacement service has taught me that most software diagnostics just fall short. One place I worked for used a combo of Prime95 and some custom stress-testing software - almost every machine would pass
Not so simple. (Score:2)
There are some fairly straight forward applications that several readers have mentioned.
However, relying on software to determine a fault when no fault indicators are built into your motherboard is an inherently flawed logic.
The vast majority of systems today are quite dumb and have no reporting. Even on more expensive systems this reporting is still not the most reliable method of troubleshooting hardware.
That is to say that software cannot be helpful in the troubleshooting process. It can be immensely use
QuickTech or QT (Score:3, Interesting)
My shop uses it, works pretty well. A full scan can take up to 6 or 8 hours (we set up hardware diags before leaving for the night, and in the morning on a 24-channel KVM), but it is THOROUGH. VRAM, RAM, HDD, CPU, everything is tested and thoroughly. First step should be testing the PSU, then running QT.
Only a couple tools needed. (Score:3, Informative)
Re: (Score:2, Informative)
Furmark http://www.ozone3d.net/benchmarks/fur/ [ozone3d.net]
Is better suited for stressing your GPU, it's also free.
Re: (Score:3, Interesting)
It is a good start, but no more than that. Those tests are certainly not comprehensive (and should be). On the plus side, they often have your specific hardware in mind, and might possibly catch something that other tools wont. (doesn't happen often, but sometimes...)
SMART is also not the end-all of hard drive indicators. A drive can pass SMART, and still be on the way out. I've found (for those familiar with Linux) that a dd from the hd to /dev/null will often spit out errors on a drive that's getting
Re:I wish you had asked this question 2 weeks ago. (Score:2)
It's your power mains... get a good UPS with a line conditioner.
Re: (Score:2)
You may like to know what's broken , but that's pointless as you need to change both the cpu and the motherboard , and i explain myself.
Only pointless if you don't plan to get a replacement under warranty.