Everything You Know About Disks Is Wrong 330
modapi writes "Google's wasn't the best storage paper at FAST '07. Another, more provocative paper looking at real-world results from 100,000 disk drives got the 'Best Paper' award. Bianca Schroeder, of CMU's Parallel Data Lab, submitted Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? The paper crushes a number of (what we now know to be) myths about disks such as vendor MTBF validity, 'consumer' vs. 'enterprise' drive reliability (spoiler: no difference), and RAID 5 assumptions. StorageMojo has a good summary of the paper's key points."
MTBF (Score:5, Interesting)
Suppose a tire manufacturer drove their tires around the block, and then observed that not one of the four tires had gone bald. Could they then claim an enormous MTBF? Of course not, but that is no less absurd than the testing being reported by hard drive manufacturers.
Re:MTBF (Score:5, Informative)
Re:MTBF (Score:5, Insightful)
It does. But it also says -- repeatedly -- that the data is disk replacement data, NOT disk failure data. i.e. it's data on the number of problems that the user tech thought might be fixed by replacing the disk, not by the number of disks that actually failed. One might wonder if, for example, the response to a system failing while it was being set up or in early lifetime might not be to put the whole damn thing into a box and ship it back to the vendor rather than dink around trying to figure out what is wrong. That won't be recorded as a disk failure.
The study is fine -- really it is. But, table 3 ought to give pause. It's quite clear that different data sets show quite different diagnostic patterns. We've got one set of data that says that power supplies, for example, are hardly ever replaced and a second set that says that they are the most frequently replaced item. There MAY be good reasons for this. But it could also be an indication that the technicians are incompetent, that the record keeping is erratic, or (and I'd seriously consider this one) that only certain kinds of failures are being recorded.
Finally, I think someone really ought to mention that there is no way that a disk manufacturer is actually going to measure MTBFs of 100000 hours prior to printing up the data sheets. The problem is that there are only around 750 hours in a month. And you need a reasonable number of failures (many quality guys would say at least 4) in order to get a reasonably valid MTBF. In order to actually measure a six digit MTBF, the manufacturer would have to run maybe 500 units for a month. My guess is that isn't going to happen. If they have the production line producing 500 units, they are going to ship them. Manufacturer MTBF data are surely based on data from a handful of engineering and preproduction units plus a bunch of wild guesses.
My guess, and it is just a guess, is that manufacturer MTBFs for disks are probably pretty much the MTBF goal in the drive specifications established before the design actually started.
Incidentally, based on some experience with other sorts of high tech gadetry, if the engineering/preproduction units do fail during test, a failure analysis will be done, and steps will be taken to fix the problem. Problem's fixed. OK, we shouldn't count those failures since they won't happen any more. That's called "censoring failure data". Begin to get an idea why disk MTBFs might be pretty much pure fiction?
Re:Infant Mortality and stuff (Score:5, Insightful)
Uh, but wasn't this data accumulated via testing actual drives? That's... kinda how science works--by replacing anecdotal evidence with scientifically-gathered data. That's basically condemning science in favor of anecdotes--and the medical fields can tell you how well _that_ works.
Re: (Score:3, Interesting)
I don't deploy "enterprise" drives, they're overpriced, and the few I did install years ago proved to be less reliable than "consumer" drives. My real world experience is that the "consumer" drives are generally reliable, I just plan on a 2-3 year replacement schedule.
I can't disagree with RAID being fallible depending on what takes out the drive, though.
Re:Infant Mortality and stuff (Score:5, Insightful)
Uh the paper is based on _real_world_ stats (which part of "empirical evidence" + "she looked at 100,000 drives" don't you understand?).
Your assumptions = theory. Paper = real world.
And that's why the paper was voted "Best Paper", because it seems lots of people had similar assumptions and this paper is very useful to at least get some people to revisit those assumptions.
It might still be proven wrong by a bigger/better study, or it could turn out that it was flawed in some way. But I'll give them the benefit of doubt - more than I'll trust the MTTF/MTBF figures from drive manufacturers.
Re:Infant Mortality and stuff (Score:5, Informative)
This study looks pretty realistic to me, in fact its better data than the Google paper's because they are looking at different usage scenarios. The study also jives with vendor's warranty periods -- right around the 3 year mark (end of warranty) failures start going up.
I take issue with your "real world vs. theory" argument version workstation disks and server disks as well, only because I have my own numbers. Based on numbers that my company gathers for its 50,000 workstations, the disk failure rate is around 1.9% annually. (Still alot of disks) There are exceptions -- those numbers are driven upward by one deployment of workstations from a vendor that had a 22% failure rate. (the PCs were replaced by the vendor) Server disks are in the same ballpark - slightly less that 2%.
Vendors provide more evidence of that fact. Many servers are being shipped with SATA disks, often the same as what you'll find in workstations. If SATA was less reliable, that would increase the vendor's support costs and they wouldn't ship them.
You're totally right about RAID-5... it can be a dangerous thing for an inept admin. Bad disks often come in batches, and bad controllers can ruin your day. A redundant array of bad data isn't very helpful ;)
Re:Infant Mortality and stuff (Score:5, Insightful)
Raid controllers comes in two flavors. Ones that are very well supported and you will always find a similar or compatible one if that controller fails, the down side of this type is it is very expensive. The other type is the cheap ones, you know, the ones for under $100 which may not exist in 2 years when your fails leaving your raid array useless and the on board SATA raid chip sets that change at least yearly as well. Good luck with those. They do work but I'd bet you will have more problems with the raid setup itself then with actual drives the data is on.
I know, KISS is not in typical
OSS Software RAID, too. (Score:5, Insightful)
I agree though, that for most people, some sort of "userland RAID" where the disks are just mounted as regular volumes to the filesystem, and then you just write the data twice, is probably the best bet. There's no format problems, and you'll always be able to pull a drive out, stick it in another machine, and get at your data.
Re:MTBF? RTFA. (Score:5, Informative)
Well, the article actually says that drives don't have a spike of failures at the beginning. It also says failure rates increase with time. So you're right that MTBF shouldn't be taken for a single drive, since the failure rate at 5 years is going to be much higher than at one.
The other thing that the article claims is that the stated MTBF is simply just wrong. It mentioned a stated MTBF of 1,000,000 hours, and an observed MTBF of 300,000 hours. That's pretty bad. It's also quite interesting that the "enterprise" level drives aren't any better than the consumer level drives.
and Google contradicts. (Score:5, Interesting)
Hmm, the Google paper says they do, from 3-6 months (Figure 2).
Which leaves us with confirmation that 50% of all studies are wrong.
Actually, mostly it DOESN'T contradict (Score:5, Insightful)
More importantly, they don't contradict each other in respect to the rest of the curve. With or without that spike, the curve just doesn't look like the bathtub fairy tale that drive makers try to bullshit us with. You're led into a false sense of security that, basically, if a drive didn't fail within the first couple of months, then it'll be at a (nearly) constant and very small probability to fail for the whole next 5 years, and only then it starts rising again. Basically that if you upgrade your drives every 4 years, whatever didn't fail within 2-3 months, heck, it's very unlikely to fail. And the curve just doesn't look that way. The probability to fail rises continuously, and (again whether that spike actually exists or not) after as little as 1 year you're above the starting height of the "bathtub" already.
In retrospect, I don't even know when and why the "bathtub" myth even started. The bathtub distribution was originally for stuff like electronic components, without moving parts. For something with mechanical wear and tear like a hard drive, who the heck came up with the idea that the same curve must apply? Shouldn't it have been common sense all along that it linearly gets more wear and tear?
Both papers also tell us that the manufacturers' MTBF numbers are, basically, pure bullshit. They're some impressive number put there for the benefit of the marketting department, not because someone at Seagate/Maxtor/whatever actually believes that number.
In retrospect, again, we should have had an alarm signal when the manufacturers lowered there warranty from 3 to 1 year. If indeed there was (1) the MTBF they claim, and more importantly (2) the bathtub curve they claim, the reduction wouldn't have even made too much of a difference. I mean, most drives would have failed withing a couple of months, followed by barely a trickle of deffective drives for the next 5 years straight. Why bother doing the bad-for-marketting thing of lowering the warranty in that scenario? Or did they already know that they lie?
And finally, a very important point is that (again, bullshit marketting claims be damned) there is no difference in reliability between cheap SATA and expensive SCSI and FC. There is this assumption permeating the whole society that if something is expensive, it _must_ automatically be better and more durable than the cheap stuff. That if you buy a big plasma TV, it's automatically better and last longer than an el-cheapo CRT. (Yeah, right. Plasma is actually known for its decay over time.) A whole edifice of consumerism, conspicuous consumption, and SFV (Stupid Fashion Victim) syndrome is based on that bullshit excuse to spend more than you need to spend. "Yeah, but it'll be better and last longer!" Yeah, right.
I've actually met people who wouldn't even _consider_ putting a ATA drive in any kind of server. "What, you're going to put your enterprise data on ATA drives???" (Said with a perplexed look, as if I had proposed flushing it to
Re:MTBF (Score:4, Interesting)
With the right model, it is possible to extrapolate life expectancy from a short trial. It is just that the manufacturers have no incentive to tell the truth, so they don't. Vendors never tell the truth unless some standardized measurement is imposed on them.
Re: (Score:3, Informative)
Re: (Score:3, Insightful)
Re:MTBF (Score:4, Informative)
Re:MTBF (Score:4, Informative)
Now, you can have all sorts of distributions that you draw that mean from, but a mean is a mean.
Re: (Score:3, Insightful)
Let's say you have five units with an MTTF of 5000 hours, and we put a new one into service every 500 hours.
It'll look something like this:
0-5000
500-5500
1000-6000
1500-6500
2000-7000
Now, each drive failed after five thousand hours. This is the mean time to failure. In other words, each drive had, on average, 5000 hours on it when it failed.
Next, let's calculate MTBF. There were 5 failures, with a total of 7000 hours of operation. This would res
Re:MTBF (Score:4, Insightful)
Dont forget the M in MTBF. Its mean [wikipedia.org] (stastically speaking...). That means (!) that some might fail now, some later, but on average they last a while. Manipulate that information and you might get 1,000,000 hrs MTBF, but you have to account for and not forget about the worst case senario (thats what a failure is) which might be the next drive is going to fail *now*, which is why RAID5 isnt as good as it might seem looking at the average statistics.
Backup, backup, backup has always been my motto (and thats just personal data). Interesting that Google thinks this is the way to go also (i.e. 3 copies of all data)
moving parts (Score:5, Funny)
Re: (Score:2, Interesting)
Re:moving parts (Score:5, Informative)
They do have a limited read/write lifetime for each sector, BUT the controllers automatically distribute data over the least-used sectors (since there's no performance penalty to non-linear storage), and you wind up getting the maximum possible lifetime from well-built solid-state drives (assuming no other failures).
So in practice, the lifetime of modern solid state will be better than spinning disks as long as you aren't reading and writing every sector of the disk on a daily basis.
Re: (Score:2, Informative)
Re:moving parts (Score:4, Interesting)
Lets face it there is no reliable storage media, the only way to be safe is multiple copies.
Re: (Score:3, Insightful)
The point you didn't get was that even solid state disks can fail without warning, so you need a backup anyways.
You only need a single counterexample to disprove a theory.
Re: (Score:2)
Re:moving parts (Score:5, Funny)
-C
Re:moving parts (Score:4, Funny)
Every single solid state drive will fail too... (Score:3, Informative)
If something has an MTBF of 1 million hours (that's 114 years or so), then you'll be a long time dead before it fails.
At this stage, the only reasonable non-volatile solid state alternative is NAND flash which costs approx 2 cents per MByte ($20/Gbyte) and dropping. NAND flash has far slower transfer speeds than HDD, but is far smaller, uses less power and is mechanically robust. NAND flash
Re: (Score:3, Informative)
Re: (Score:2)
I'm going to live forever!
Re: (Score:2)
Re:moving parts (Score:5, Informative)
Re: (Score:3, Informative)
Re: (Score:2)
Re: (Score:3, Funny)
i'll tell you (Score:3, Interesting)
It means I should be storing my important, important data on a service like S3. [amazon.com]
Re: (Score:2)
Re: (Score:2)
So that Department of Redundancy Department really does something after all!
Re: (Score:3, Funny)
"Everything You Know About Disks Is Wrong" (Score:3, Funny)
Re:"Everything You Know About Disks Is Wrong" (Score:4, Funny)
Amazing! (Score:3, Insightful)
Re: (Score:2)
Re: (Score:3, Informative)
afaict Linux software raid is actually pretty good nowadays at least as long as you stick to the basic raid levels
beware of the very common fake hardware (e.g. really software but with some bios and driver magic to make the array bootable and generally behave like hardware raid from the users point of view) controllers. Theese often have far worse performance tha
Re: (Score:3, Informative)
That is, unless you go for software RAID, which will put a hit on your processor.
This myth needs to die. No remotely modern processor takes a meaningful performance hit from the processing overhead of RAID.
However, I think if you're going to make the investment to go with RAID 5, then buying a proper hardware controller won't add a significant amount to the cost of your set up.
Decent RAID5-capable controllers are hundreds of dollars. Software RAID is free and - in most cases - faster, more flexible a
Re: (Score:3, Interesting)
Since we are talking about IO-bound operations, does that matter? I mean, CPU is hardly ever the bottleneck these days, the hard-drive quite often is. So even if soft-RAID puts more load on the CPU, does it cause any slowdown? Espesially if it makes IO faster?
Re: (Score:3, Informative)
Uh sorta. Depends on the raid type. Striped will be faster, mirrored will be about as fast, raid 5 is gonna be the slowest, even in hardware.
Compared to a single disk, RAID5 is still going to be faster (except perhaps for the odd corner-case here and there).
Also, in many cases, software RAID5 is faster that hardware RAID5.
Re: (Score:2)
Personally I backup all my data to a server running raid 1 (hard drives are relatively cheap and ra
Re: (Score:2)
Re: (Score:2)
Seriously. The only dead drives I've ever seen are either IBM Deathstars (known by that name so completely that I don't know what the actual brand name is... 'disk star' perhaps?) and Western Digital drives. I generally buy Seagate or Hitachi drives, and I've never had a failure. Usually I run out of space and have to upgrade before the drives die. IBM drives other than the Deathstars seem to do ok as well.
Re: (Score:2)
I lost two raid 5 setups to those because they failed faster than we could replace them. (1 spare and several days for shipping) Out of the 7 we had in two servers, 5 of them failed so the nick name is not undeserved in my opinion.
Re: (Score:2, Insightful)
I you build your own se
Re: (Score:2)
Re: (Score:2)
First guess? Your system has a dirty power supply. (Unless you have a high-quality PSU and have a line-noise-filtering UPS, this is entirely possible.)
This article has told me one thing: it's time to get a RAID setup. I've been looking at RAID 5, but two things still trouble me, the price and the performance hit. Does anyone have any information on just how much a performance hit I might experience if I have to access
Re: (Score:2)
Re: (Score:3, Interesting)
I'm sure there are people around with even older, still-working-fine gear. A while back, I saw some DEC disk packs for the early removable-platter hard drives selling on eBay, as pulls-from-wor
infant mortality (Score:5, Insightful)
the large shops like these studies are looking at get the drives in bulk directly from the manufacturer, the rest of us who have to go through several middle-men before we get our drives have more of a chance that something happened to them before we received them.
David Lang
Re: (Score:3, Insightful)
Comment removed (Score:4, Insightful)
Re: (Score:2)
Re: (Score:2)
You just posed the one question to which I'd actually have liked to know the answer... Turn it on and off as needed (minimize runtime), or leave it on all the time if you'll use it at least a few times per day (minimize power cycling).
I know that counts as something of a religious issue among geeks, but I'd still have liked a good solid answer on it... It even has implications for whether or not we should let our non-laptops
Re: (Score:3, Informative)
- I never turn off the PC.
- The case has no cover.
Re: (Score:3, Interesting)
Right now my everyday HDs number thus:
6.4GB W.D. -- new in 1998, has always run 24/7. No SMART but probably has upward of 70,0
Re:Desktop vs Server usage. (Score:5, Interesting)
Most enterprise level operations that relies on their data replace drives before they fail. In fac tthe replacement rate was increased to every 2 years not for failure prevention but for capacity increases.
Re:Desktop vs Server usage. (Score:5, Informative)
You worked at an unusual place!
I'm a Tech Support Engineer for a large storage system manufacturer and I can tell you that NONE of our customers replace disks before they fail unless our OS detects a "predictive failure" for the disk. Our customers are some of the biggest names in business from all over the planet.
Re: (Score:3, Interesting)
Re: (Score:2)
The concept is bizarre to me. I haven't shut my desktop off on a daily basis in probably 15 years (or about as long as I've been running Linux as my desktop).
This has nothing to do with the OS though. I don't power cycle any of my important electronics more than needed because I do believe it stresses them. My (PC) computers have always run 24/7 unless there is an electrical storm passing over or I don't have power.
The last time I power cycled on a daily basis w
Re: (Score:3, Informative)
1) Electricity consumption
2) Power cuts (unless you have a UPS and software for a clean shutdown installed, what happens if there is a power cut while you are away?).
3) Power fluctuations (my power supply blew dramatically after one a few months ago) and lightning.
4) Heat (in a hot climate)
Re: (Score:2)
Cyrus IMAP (Score:3, Interesting)
For best-of-breed open source IMAP, that means Cyrus IMAP replication.
Re: (Score:2)
Eventually - but we're not turning our automated replication checks off just yet... [irbs.net]
Amazing the little corner cases that can go in and corrupt your data for you.
This paper and the Google paper are complementary (Score:5, Informative)
The Google paper shows that relatively high temperatures and high usage rates don't affect disk life.
The current paper shows that interface (SCSI, FC vs ATA) had no effect either. The Google paper shows
a significant infant mortality that the CMU paper didn't, and the Google paper shows some years of flat
reliability where the current paper shows decreasing reliability from year one.
The both show that the failure rate is far higher than the manufacturers specify, which shouldn't come
as a surprise to anybody with a few hundred disks.
I'm particularly pleased to see a stake driven through the heart of "SCSI disks are more reliable."
Manufacturers have been pushing that principle for years, saying that "oh, we bin-out the SCSI disks
after testing" or some other horseshit, but it's not true and it's never been true. The disks are
sometimes faster, but they're not "better".
Thad
Re:This paper and the Google paper are complementa (Score:2)
We need a better file system... (Score:2)
Further, these results validate the Google File System's central redundancy concept: forget RAID, just replicate the data three times. If I'm an IT architect, the idea that I can spend less money and get higher reliability from simple cluster storage file replication should be very attractive.
Someone needs to hurry up and write a good cross platform clustering file system solution. Something that encourages a company to buy bigger, better value HD's for their desktops so they can be used as redundant storage.
100,000 Disk Drives? (Score:2)
Re: (Score:2)
Yay (Score:2)
Software RAID FTW!!
In all seriousness, in truly critical storage you save your stuff under a RAID1. RAID5 is simply too unreliable for the task(not to mention that those controllers aren't exactly cheap).
So save yourself trouble, money, and grief, and just user logical volume management to replicate drives.
So SSD's are not only faster, but more reliable? (Score:4, Interesting)
Would an analysis tell us that SSDs are not only faster but more reliable and if so by how much?
forget RAID? (Score:3, Informative)
The fact that another drive in an array is more likely to fail if one has already failed makes a lot of sense, but the conclusion to forget RAIDs doesn't. Arrays are normally composed of the same drive model, even the same manufacturing batch, and are in the same operating environment. If something is "wrong" with any of these three variables, and it causes a drive to fail, it's common sense the other drives have a good chance at following. I've seen real-world examples of this.
In my real-world situations, the RAID still did it's job, the drive was replaced, and nothing was lost, despite subsequent failure of other drives in the array. Sure you can get similar reliability at a lower price by replicating data, but I think that's always been understood as the case. Furthermore, as someone else in the forum mentioned, enterprise-class RAIDs are often used primarily for performance reasons. A modern hardware RAID controller (with a dedicated processor and ram) can create storage performance unattainable outside of a RAID.
Schroeder's disk... (Score:2, Funny)
How much does handling matter? (Score:5, Interesting)
The google paper talks a bit about certain drive "vintages" being problemmatic, but I wonder if they buy drives in large lots, and perhaps some lots might have been handled roughly during shipping. If they could trace back each hard drive to the original order, perhaps they could look to see if there's a correlation between failure and shipping lot.
-R
Re:How much does handling matter? (Score:4, Informative)
here you go
http://hardware.slashdot.org/article.pl?sid=07/02
Lemon or not (Score:2)
Personally I have only ever had one drive go on me (a quantum scirroco) in 10 years. For myself, and most home users, that's a great track record. On the other hand, I have had friends and relatives who's drives just up and quit. New ones, old one, many brands. As long as you buy a major brand, they seem to be more or less equa
all this is moot (Score:3, Insightful)
Sure, some of them end up being replaced under warranty, but a lot of them don't, and so Maxtor/IBM/Hitachi make another buck off your sorry ass. There isn't a sane server admin that doesn't keep a set of spares in his desk drawer, because it's not a question of "if" it dies but WHEN. Hell, most decently-geared techies have a whole box of hard drives, pre-mounted in hotswap bays ready to rock. And if it weren't for the fact that I was just laid off a month ago, I'd be buying a couple spare SATA drives myself, I just have a funny feeling something's going to go tits up in my media server. I haven't had any warnings or hiccups, but I just know the Seagate devil's planning his move, waiting for 2 drives to start straying so he can kill my Raid-5 nice and fast. Hard drives are little more than Murphy's Law in a box.
Exponential with time (Score:3, Informative)
Waaaah! They cry, when I tell them there is no hope for the family photos, barring a media reclamation service == $$$
I tell everyone: "Assume your hard drive will fail at any moment, starting now! What is on your hard drive that you would be upset if you never saw it again?"
Hard drives on consumer PCs (Score:2)
> hard drive that you would be upset if you never saw it again?"
True enough, I use a similar warning. Mine is, "Don't leave anything on your hard drive you care about. If you manage to make it a year without reloading Windows the drive can crap out with no warning. Burn anything you can't download again to a CD/DVD."
Personally I don't have to worry about Windows and I have a RAID5 at home.... but I stil
Nothing I knew about hard drives was mentioned (Score:3, Insightful)
The electronics on the hard drive rank as major players in heat generation in the boxen.
Heat kills transistorized components.
"Hard Drive Data Recovery" companies often have nothing more sophisticated than a hard drive buying program, and very competent techs soldering and unsoldering drive electronics. They buy a few each of most available hard drives, as the drives appear on the market. When a customer sends them a hard drive for "recovery", the techs find a matching drive in inventory, disconnect the electronics, and replace the electronics in the drive. The percentage of drive failures due to mechanical failure is very low.
When I bought a desktop computer for an unsophisticated family member, I also purchased and installed a drive cooler - a special fan that blows directly on the drive electronics.
I was very concerned about MTBF. I just assumed that the manufacturer's information was totally irrelevant to my situation - a hard drive in a corner of the tower, covered with dust, and no air circulation.
I occasionally pick up used equipment from family and friends. Usually, it is broken. Often, it is the hard drive. What is amazing is not that they failed, but that they lasted so long with a 1.5 inch coating of insulating dust.
I suspect this would also explain the rising failure rate with time. Nobody seems to clean the darned things. They just sit and run 24/7/365, until they fail.
Re: (Score:2)
disk spin-up is most responsible for failure ? (Score:3, Interesting)
Now if that's really true, wouldn't it be quite simple for the manufacturers to simply spin-up the disk more slowly by putting in very simple and reliable motor control circuitry ?
Does anyone have any real evidence, i.e. not anecdotal, that this is really true.
Re:Dr. Schroeder is pretty hot, too! (Score:5, Funny)
Re: (Score:3, Funny)
You call that failure?!? I'd call it success.
Re: (Score:2, Insightful)
June 2006 Microsoft Research, Mountain View, CA. Host: Chandu Thekkath. "Understanding failure at scale".
Its okay man.. She will understand..
Re: (Score:2, Offtopic)
MTBF, in this case, means Mean Time Between Farkings. So yeah, three seconds is an astoundingly short refractive period. :)
Human MTBF (Score:5, Funny)
Of course if we count relatively minor failures (like forgetting to take out the trash or pick up dirty underwear), then MTBF is approx 27 minutes!
That's wrong (Score:3, Informative)
It helps, and distributing the data more helps more. Someone concerned about multi-drive failures can, for example, use a 3-way RAID 1 array, or a RAID 6 array (which can tolerate the loss of any 2 drives).
Re: (Score:2, Insightful)
Re: (Score:2)
Re: (Score:3, Informative)
There are three real dangers with raid
The first is that arrays are typically built out of identical drives, usually drives from the same batch and then all the drives are run for the same time periods. This means that if there is a design or manufacturing fault that causes a failure peak at a certain number of operational hou
Re: (Score:3, Informative)
It's true that you should never buy anything for the illusion of reliability, but the article does not claim RAID is not a good way to get reliability.
First, let's look at the common mistake when people think about RAID: "If the probability of a drive failure is X, then the probability of two drives in a RAID volume failing is X*X, whic