Reliability of Computer Memory? 724
olddoc writes "In the days of 512MB systems, I remember reading about cosmic rays causing memory errors and how errors become more frequent with more RAM. Now, home PCs are stuffed with 6GB or 8GB and no one uses ECC memory in them. Recently I had consistent BSODs with Vista64 on a PC with 4GB; I tried memtest86 and it always failed within hours. Yet when I ran 64-bit Ubuntu at 100% load and using all memory, it ran fine for days. I have two questions: 1) Do people trust a memtest86 error to mean a bad memory module or motherboard or CPU? 2) When I check my email on my desktop 16GB PC next year, should I be running ECC memory?"
Cost benefit analysis (Score:2, Insightful)
Is ECC memory worth the money in a machine you use to check your E-mail? Can't you just reboot and/or replace the memory if errors occur?
I could see it happening when the cost of ECC memory is no higher than normal memory, and using ECC memory has no or minimal impact on performance, until then, I won't expect to start seeing it desktop machines.
If you want ECC memory on your desktop, feel free to build your own machine with a motherboard that supports ECC memory. Some high end desktops do support ECC memory already.
The truth (Score:5, Insightful)
My first computer was a 80286 with 1 MB of RAM. That RAM was all parity memory. Cheaper than ECC, but still good enough to positively identify a genuine bit flip with great accuracy. My 80386SX had parity RAM, so did my 486DX4 120. I ran a computer shop for some years, so I went through at least a dozen machines ranging from the 386 era through the Pentium II era, at which point I sold the shop and settled on a AMDK62 450. And right about the time that the Pentium was giving way to the Pentium II, non-parity memory started to take hold.
What protection did parity memory provide, anyway? Not much, really. It would detect with 99.99...? % accuracy when a memory bit had flipped, but provided no answer as to which one. The result was that if parity failed, you'd see a generic "MEMORY FAILURE" message and the system would instantly lock up.
I saw this message perhaps three times - it didn't really help much. I had other problems, but when I've had problems with memory, it's usually been due to mismatched sticks, or sticks that are strangely incompatible with a specific motherboard, etc. none of which caused a parity error. So, if it matters, spend the money and get ECC RAM to eliminate the small risk of parity error. If it doesn't, don't bother, at least not now.
Note: having more memory increases your error rate assuming a constant rate of error (per megabyte) in the memory. However, if the error rate drops as technology advances, adding more memory does not necessarily result in a higher system error rate. And based on what I've seen, this most definitely seems to be the case.
Remember this blog article about the end of RAID 5 in 2009? [zdnet.com] Come on... are you really going to think that Western Digital is going to be OK with near 100% failure of their drives in a RAID 5 array? They'll do whatever it takes to keep it working because they have to - if the error rate became anywhere near that high, their good name would be trashed because some other company (Seagate, Hitachi, etc) would do the research and pwn3rz the marketplace.
Re:Joking aside... (Score:5, Insightful)
As for ECC in memory... The problem is that ECC carries a heavy performance hit on write. If you only want to write 1 byte, you still have to read in the whole QWord, change the byte, and write it back to get the ECC to recalculate correctly. It is because of that performance hit that ECC was deprecated. The problem goes away to a large extent if your cache is write-back rather than write-through; though there will be still a significant number of cases where you have to write a set of bytes that has not yet been read into cache and does not comprise a whole ECC word.
AFAIK, on modern computer systems all memory is always written in chunks larger than a byte. I seriously doubt there's any system out there that can perform single-bit writes either in the instruction set, or physically down the bus. ECC is most certainly not "depreciated" -- all standard server memory is always ECC, I've certainly never seen anything else in practice from any major vendor.
The real issue is that ECC costs a little bit more than standard memory, including additional traces and logic in the motherboard and memory controller. The differential cost of the memory is some fixed percentage (it needs extra storage for the check bits), but the additional cost in the motherboard is some tiny fixed $ amount. Apparently for most desktop motherboard and memory controllers that few $ extra is far too much, so consumers don't really have a choice. Even if you want to pay the premium for ECC memory, you can't plug it into your desktop, because virtually none of them support it. This results in a situation where the "next step up" is a server class sytem, which is usually at least 2x the cost of the equivalent speed desktop part for reasons unrelated to the memory controller. Also, because no desktop manufacturers are buying ECC memory in bulk, it's a "rare" part, so instead of, say, 20% more expensive, it's 150% more expensive.
I've asked around for ECC motherboards before, and the answer I got was: "ECC memory is too expensive for end-users, it's an 'enterprise' part, that's why we don't support it." - Of course, it's an expensive 'enterprise' part BECAUSE the desktop manufacturers don't support it. If they did, it'd be only 20% more expensive. This is the kind of circular marketing logic that makes my brain hurt.
Re:Paranoia? (Score:4, Insightful)
several hundred thousand orders of magnitude
We've crossed beyond the realm of the astronomical and into something else entirely. Surely you meant several orders of magnitude, aka, hundreds of thousands of times? Let's keep things on this side of the googol.
Use ECC in anything you care about (Score:2, Insightful)
really, it's not that much more expensive. Search newegg for unbuffered ecc, if you are using a desktop class system that can't handle registered ram.
You wouldn't put data you care about on a hard drive without raid, would you?
OK (Score:3, Insightful)
Yes. I do, anyway; I've never had it report a false-positive, and it's always been one of the three (and even if it was cosmic rays, it wouldn't consistently come up bad, then, would it?). Then again, it could also mean that you could be using RAM requiring a higher voltage than what your motherboard is giving it. If it's brand-name RAM, you should look up the model number and see what voltage the RAM requires. Things like Crucial Ballistix and Corsair Dominator usually require around 2.1v.
Depends. If you're doing really important stuff then sure. ECC memory is quite a boon in that case. If you're just using your desktop for word processing and web browsing, it's a waste of money.
Re:Surprise? (Score:3, Insightful)
Re:The truth (Score:5, Insightful)
Yes. A higher energy particle hits something in the RAM, and alpha/beta particles scatter from the impact point... which is inside the memory cell.
That's why higher energy radiation is dangerous. It doesn't cause the damage itself, the products of the collision do. Radiation shrapnel, if you will.
Re:Joking aside... (Score:3, Insightful)
Phil
Re:Surprise? (Score:1, Insightful)
On Vista, other people can "wreck your installation" too.
Re:Surprise? (Score:0, Insightful)
then i guess i am unlucky. every windows XP, and vista install I have seen has been horrendously buggy, with processes like explorer.exe and iexplore.exe crashing at least once a day. Windows in all of it's glory won't even let me install IE 8 to correct the problems I am having with IE 7 as the IE 8 installer needs IE to actually work. now what kind of messed up situation is that. If Firefox or Safari where to get corrupted I can always uninstall the browser, or at the very least upgrade over. Nope not with IE.
Maybe I just expect it to work the same way every day like my Mac's at home. That is too much for Windows though. Now since it is work I hav eto send the machine to coropate headquarters so they can do the reinstall, leaving us without one for a week.
Re:Surprise? (Score:3, Insightful)
Re:Surprise? (Score:5, Insightful)
I find that when a Windows machine, from Windows 2000 on up, when taken care not to install too many programs and/or immature or junk-ware, then Windows remains quite stable and usable. The trouble with Windows is the culture. It seems everything wants to install and run a background process or a quick-launcher or a taskbar icon. It seems many don't care about loading old DLLs over newer ones. There is a lot of software misbehavior in Windows-world. (To be fair, there is software misbehavior in MacOS and Linux as well, but I see it far less often.) But Windows by itself is typically just fine.
Since the problem is Windows culture and not Windows itself, one has to educate one's self in order to avoid the pitfalls that people tend to associate with Windows itself.
Re:Surprise? (Score:5, Insightful)
Agreed. People who will sit and tell me with a straight face that Vista, in their experience, is unstable are either very unlucky, or liars. Windows stopped being generally unstable years ago. Get with the times.
I'm not convinced, I have a fairly old desktop at work I keep for Outlook use only. After a few days outlook's toolbar becomes unresponsive, and whenever I shut it down it stalls and requires a poweroff. Task manager doesn't say I'm using that much memory (still got cached files in physical ram).
I don't use windows much, I'm not used to the tricks that keep it running, where I probably use those tricks subconciously to keep my linux workstation and laptop running.
I wonder if Windows continued increase in stability is, at least partly, people subconciously learning how to adapt to it.
Re:Error response (Score:3, Insightful)
Anyone know, why PC133 memory would have an issue on a bus overclocked from 100MHz to 133? It should be able to handle it just fine, so I'd like to think :-/
It's probably not the RAM as such; the 440BX on the P2B is only officially rated for 100MHz. Overclocking the chipset can have any number of side-effects.
Re:Surprise? (Score:2, Insightful)
You must be unlucky or the cause.
Re:Surprise? (Score:3, Insightful)
A guy at work got his laptop with Vista on it. Explorer would hang often (Explorer, not IE), and if he tried to arrange his second monitor to the left of his laptop screen, the system would BSoD. (pretty funny, he had his monitor on the left, due to physical desk constraints, but he had to move his mouse off of the right side of his laptop screen where it would appear on the left of his second monitor...). We updated all the latest drivers from HP but to no avail.
Since putting Vista SP1 on though it has been fine - all those problems went away.
I have never seen another Vista machine do that though, so obviously something got broken during the install. If it was Linux I would have been able to fix it myself, but with Vista all we could do was wait for the magic hotfix or sp that might fix the problem.
Anyway, just because you haven't come across an unstable Vista install doesn't mean they don't exist. (or you're a troll and I just got sucked in horribly :)
Re:Surprise? (Score:5, Insightful)
To all the posters who think the parent is a bad mechanic I will tell you my anecdote: I have never had a harddrive fail. Never. Not on a fresh computer and not on a decade old one.
Either I have magic hands, harddrives don't fail that often or /.ers can't handle harddrives.
Or people can beat the odds. Chances, sometimes you win in a casino.
Re:Surprise? (Score:5, Insightful)
People who will sit and tell me with a straight face that Vista, in their experience, is stable are either very lucky, or Microsoft shills.
See? I can say the opposite, and provide just as much evidence? Do I get modded to 5 as well? Where's your statistics on the stability of Vista? Did it work well for you, therefore, it works well for everyone else?
I worked for a company that bought a laptop of every brand, so that when the higher-ups went into meetings with Dell, HP, Apple, etc. they had laptops that weren't made by a competitor. They have had problems like laptops not starting-up the first time due to incompatible software. That was a recent as 6 months ago. My mother-in-law bought a machine that has plenty of Vista-related problems (audio cutting out, USB devices not working, random crashes in explorer) on new mid-range hardware that came with Vista. But I have a neighbor who found it fixed lots of problems with gaming under XP.
There's plenty of issues. Vista's problems weren't just made-up because you didn't experience them.
Everybody's experience is different. Quit making blanket statements based on nothing.
Re:Surprise? (Score:3, Insightful)
It's slower than XP in any case and requires more memory.
Not true. It uses more memory than XP, but it doesn't require it. In exactly the same way that linux uses more memory than XP, but doesn't require it (it's used for system cache if you bother to check). If you actually install the 64bit version, you'll see where MS's development budget has been spent (The 32bit version of vista feels a bit like Win ME in comparison). In every test I've done, 64bit vista has crapped all over XP from quite a big height.
The problem is I don't consider decent hardware to be something an IT'er would buy
Dual core machine + 2Gb ram + integrated ATI/Nvidia/Intel X4500 GPU is more than adequate. These are pretty basic machine specs by anyone's standard, and tbh you'd be hard pressed to find a brand new machine for sale with lower specs than that.
The worst machine I've installed vista on was an old 1.6Ghz Athalon XP. It was more than happy playing blu-ray disks, and didn't perform any worse than XP. (though I did add an ATI 3650 AGP card to help out with the blu-ray decoding). That's what, a 5year old machine and a £50 upgraded graphics card.... Not exactly high end spec I must say.
My experience dictates it... (Score:4, Insightful)
If you go with non-ECC, I would suggest running memtest86+. If you get errors, swap the memory. If swapping the memory still doesn't take care of it, swap motherboards! I recently had a memory problem in one of my customers' racks, and running memtest86+ got nothing until I had it running on my bench for over a week. There may be some problems with memtest86+...I even had another bit-error that memtest86+ did not find, but a Linux commandline memory tester found a problem almost immediately
The problem here is that different testing/usage patterns result in different probabilities of finding potentially bad words, e.g. words that may only be bad if you read from them a hundred cycles consecutively. But, if you do see a failure in memtest86+ or the CLI tester, you got yourself a serious problem. The point to take from this is that if you don't see errors, that doesn't mean you don't have errors!
Having said this, I still don't think memory errors among PCs are that common. We have more RAM on machines these days, but at the same time, the manufacturing processes have become better. I have a personal conviction in believing that though the likelihood of word error due to the increased amount of words in memory has increased, the RAM itself has become so much more "solid" that the increase of memory is negligible. Now, if you do dumb things with your computer like running it without a case or not giving it ventilation( learned this the hard way) or overclocking it, you *WILL* still run into problems. But if you design a system with quality and integrity, you typically shouldn't have these issues with memory!
One last thing to point out: there is quality hardware, and there is cheap hardware. My PC-Chips motherboard ran for three months and two days, and I didn't have a problem. Two days out of warrant. Now, take my MSI motherboard. It sets the timing for all memory modules to have the values of a single module. This resulted in stable single module operation, but got flaky for all four modules. I Finally moved to ECC before I figuerd out that I had to manually set the correct timings. This board is an ultra board, but apparently, it does not include use of generic (Micron, Corsair, etc!! - tried 'em all) memory modules. People on the Newegg reviews board have memory issues with this board as well that they could not fix with a BIOS update, and it appears that sometimes a design just is bad! Even the "good" manufacturers do not spend a lot of effort to fix issues in some cases.
My words of advice: Do your homework. Read through the reviews. AND DON'T BUY HARDWARE AS SOON AS IT COMES OUT!
Re:Surprise? (Score:3, Insightful)
Looks like he posted his opinion based on his experience, and you posted your opinion based on your experience. So you should quit making blanket statements based on nothing too.
Neither of you posted statistics. Where are yours?
Re:Surprise? (Score:5, Insightful)
Dude! Take a chill pill. This is not FUD. The gp is just relating his experience, and here's a shock, YMMV! So just sit back and have another beer.
BTW, I've also had major hassles with windows - mostly related to viruses. As it happens this forced me to switch 100% to linux and I'm happy here, but not everyone who switches is. Personally I like the bandwidth I save from not constantly downloading AV updates, and the speed increase from not running AV. But hey, where you are computing power and bandwidth are probably cheap. Again, YMMV.
Re:Surprise? (Score:4, Insightful)
And some of us actually expect an OS with a certification logo program to send lawyer letters to Marvell telling them to recall that driver. Sheesh, get with the program, badly written, certified drivers make Microsoft look bad, deservedly.
Re:Surprise? (Score:2, Insightful)
Same experience I've had with Linux; meanwhile I can't remember the last time I saw Windows crash. Perhaps we shouldn't generalise from anecdotes.
Re:Surprise? (Score:2, Insightful)
Your confusing having problems with Vista and being unable to see [economist.com] that Vista has major problems. Seems like you are more likely the pr0n freak in this case ;-)
And for the record, you can stop adding the disclaimer if I didn't know better, since the fact that you use Vista is prima facie evidence that you don't know better.
Re:Surprise? (Score:5, Insightful)
Re:Surprise? (Score:3, Insightful)
It's slower than XP in any case and requires more memory.
Not true. It uses more memory than XP, but it doesn't require it. In exactly the same way that linux uses more memory than XP, but doesn't require it (it's used for system cache if you bother to check).
Umm, yes, it is true, many benchmarks were done of XP SP3 vs Vista SP1, and XP SP3 is definately faster than Vista SP1, and it definitely _requires_ less memory. I can run an XP machine with 512MB of RAM, and it will be OK. Not great, but OK. Put Vista on the exact same machine (or even on a more modern, faster machine, but still with only 512MB of RAM), and it will be a total dog. Vista really needs a bare minimum of 1GB of RAM to be usable, whereas XP will run acceptably on 512MB... you could probably get away with 320MB if you don't run any memory-itense applications.
If you actually install the 64bit version, you'll see where MS's development budget has been spent (The 32bit version of vista feels a bit like Win ME in comparison). In every test I've done, 64bit vista has crapped all over XP from quite a big height.
The problem is I don't consider decent hardware to be something an IT'er would buy
Dual core machine + 2Gb ram + integrated ATI/Nvidia/Intel X4500 GPU is more than adequate. These are pretty basic machine specs by anyone's standard, and tbh you'd be hard pressed to find a brand new machine for sale with lower specs than that.
Yeah, that should be enough to run Vista, and many new machines are spec'd like that, but businesses need to use a uniform platform across all machines... so are they supposed to throw away all their old machines and buy new ones just so they can use Vista? No, they'll wait until they have replaced all their machines with ones that can run Vista through the same update schedule as they usually use, and then they'll use Windows 7, since it is supposed to be faster/leaner than Vista anyway.
The worst machine I've installed vista on was an old 1.6Ghz Athalon XP. It was more than happy playing blu-ray disks, and didn't perform any worse than XP. (though I did add an ATI 3650 AGP card to help out with the blu-ray decoding). That's what, a 5year old machine and a £50 upgraded graphics card.... Not exactly high end spec I must say.
Actually, for consumer hardware, an ATI 3650 is rather high-spec. The most common integrated graphics chips are about 5-10% as powerful as a 3650.
All that said, I have nothing against Vista for home users, but in the business world, it just doesn't add up (unless you replace your hardware on a 2-year cycle).
Re:Surprise? (Score:5, Insightful)
You are right and you are wrong. Yes, it's true that Vista, XP or even Windows 2k are rock solid, but only as long as you don't add third party hardware driveres of dubious quality. Unfortunately many hardware venders don't spend as much effort as they should to develop good drivers. Just using the drivers that comes with windows leaves you with a rather small set of supported hardware, so people install whatever drivers that comes with the hardware they buy, and as a result they get BSOD if they are unlucky, and then they blame Microsoft.
Re:Mod Parent Up (Score:5, Insightful)
if you look at the username it's not him at all, it's someone with ID 1344097 pretending to be him. Still, what he says is sensible, and what's wrong with this piece? If it doesn't interest you, why are you reading the comments?
Re:Surprise? (Score:1, Insightful)
It doesn't fly in the face of shit. It shows that one of the few people who had an anecdote about Vista reliability calls a ridiculously inadequate length of up time proof that Vista is reliable. If what the OP says is true, he is the rare exception that got the best Vista has to offer. That "best" is woefully inadequate. BTW - You say his point is still valid. What point was that? Was it that in very rare (atypical) cases one can experience reliability from Vista that is only an order of magnitude worse than the typical Linux system? If so, I concede that you are correct, and his point is valid. In rare cases Vista sucks more [peluchetux.com.ar] by only one order of magnitude rather than the typical two+ orders ;-)
Missing the point, sorry.
Re:Surprise? (Score:2, Insightful)
I'm curious why you need your OS to be on non-stop for more than 60 days. Even for servers I don't think having a brief downtime every couple months would be a serious issue for the vast majority of users. Besides, who runs servers on Vista anyway? At least use Windows Server.
P.S. I don't run Vista and I never will, but I just don't understand what you would need years of up-time for on a home PC. I power down my PC most nights (unless I'm downloading/uploading something), just because there's no sense wasting electricity.
Re:Surprise? (Score:5, Insightful)
The OS running on the cheapest hardware with the most clueless user base has the highest failure rate? You don't say!
Re:Surprise? (Score:5, Insightful)
And for those who will go to the security well here, we call it a trade-off. For many systems uptime is more important. It generally isn't a very big risk to run an older Linux kernel though it is more risky than not updating. In a world of blind men, the one-eyed man is king. We can sacrifice a modicum of security, exchanging our plate mail for chain mail, and still feel confident because we are surrounded with weaponless peasants
Re:Surprise? (Score:5, Insightful)
Or could it be that they have a queue full of machines waiting for reinstalls, etc? No. It couldn't be that, since we all know that the thousands of people saying they have had major problems are liars, and we have as evidence a few people who claim that they haven't had major problems, or don't know that they have problems ;-)
Re:Quit trolling (Score:3, Insightful)
I do get it, but some people (like me) automatically skip past sigs, and the guy created his frickin name as KDawson's name plus his userID, what about that don't you get?
Re:Surprise? (Score:3, Insightful)
Holy Shit! 61 Days! And you had to reboot for updates, so who could complain about that??!!! Oh I don't know, how about the Linux users who use an OS that has uptimes measured in years. What's that you say? How incompetent are they if they don't update for years? You see that is the thing. They did apply updates daily ! Linux uses some kind of voodoo magic to allow updates without downtime! (it's scary to think about, I know). Now go back to your stickball game kid. The adults have some real computing to do. You are the epitome of the idea that people who use Windows simply don't know any better.
So; you ran daily updates on your system and had uptime measured in years? How did you manage to patch/update your kernel? Did you apply those patches/updates without rebooting? How?
From your post, you sound like the epitome of an arrogant Linux user who throws half-truths around while looking down your nose at everyone who isn't just like you. When you use that tone, do you expect people to actually listen to you or are you just trolling for an argument? Sheesh.
Re:Surprise? (Score:5, Insightful)
I worked for a company that bought a laptop of every brand, so that when the higher-ups went into meetings with Dell, HP, Apple, etc. they had laptops that weren't made by a competitor. They have had problems like laptops not starting-up the first time due to incompatible software. That was a recent as 6 months ago. My mother-in-law bought a machine that has plenty of Vista-related problems (audio cutting out, USB devices not working, random crashes in explorer) on new mid-range hardware that came with Vista. But I have a neighbor who found it fixed lots of problems with gaming under XP.
On the other hand, my Linux server freezes up and needs to be reset (sometimes even reboot -f doesn't work) every few days due to a kernel bug, probably some unfortunate interaction with the hardware or BIOS. (I'm using no third-party drivers, only stock Ubuntu 8.04.) And hey, in the ext4 discussions that popped up recently, it emerged that some people had their Linux box freeze every time they quit their game of World of Goo. Just yesterday I had to kill X via SSH on my desktop because the GUI became totally unresponsive, and even the magic SysRq keys didn't seem to work. Computers screw up sometimes.
What's definitely true is that Windows 9x was drastically less stable any Unix. Nobody could use it and claim otherwise with a straight face. Blue screens were a regular experience for everyone, and even Bill Gates once blue-screened Windows during a freaking tech demo.
This is just not true of NT. I don't know if it's quite as stable as Linux, but reasonably stable, sure. Nowhere near the hell of 9x. I used XP for several years and now Linux for about two years, and in my experience, they're comparable in stability. The only unexpected reboots I had on a regular basis in XP was Windows Update forcing a reboot without permission. Of course there were some random screwups, as with Linux. And of course some configurations showed particularly nasty behavior, as with Linux (see above). But they weren't common.
Of course, you're right that none of us have statistics on any of this, but we all have a pretty decent amount of personal experience. Add together enough personal experience and you get something approaching reality, with any luck.
Re:Surprise? (Score:1, Insightful)
Except he said the both cases were possible, and the grandparent said only his case was the correct one.
Re:Surprise? (Score:2, Insightful)
Looks like he posted his opinion based on his experience, and you posted your opinion based on your experience. So you should quit making blanket statements based on nothing too.
Neither of you posted statistics. Where are yours?
I think you misunderstood his posting. He is saying, "Look, if I use your loose methods of argument, I can 'prove' the opposite! Watch!"
Re:Surprise? (Score:3, Insightful)
Even for servers I don't think having a brief downtime every couple months would be a serious issue for the vast majority of users.
My servers run non-stop because my users are non-stop.
No, they're not chained to their desks 24/7, but the user base is large enough that at any given moment, it's crunch time for *someone*.
In order to find those rare times when bringing a server down won't cause someone to miss a critical deadline, downtime has to be planned months in advance. To do that I have to be able to rely on my systems being stable for months or years on end.
As the other poster said, rebooting just to interrupt the degradation process is not an option for me. If a system degrades on its own, I have to find out why and fix it.
Keep in mind, though, that this is an OS-independent issue. Some of the applications I support only run on certain OSes, so I have to build stable Linux systems, stable Windows systems, etc. Nobody cares but me that that's harder to do with some OSes than with others.
Re:Surprise? (Score:3, Insightful)
Re:New Microsoft ad slogan (Score:4, Insightful)
This is truly a sign that Windows has caught up with Linux: It used to be only Linux users saying that, but now Windows users are, too!