Everything You Know About Disks Is Wrong 330
modapi writes "Google's wasn't the best storage paper at FAST '07. Another, more provocative paper looking at real-world results from 100,000 disk drives got the 'Best Paper' award. Bianca Schroeder, of CMU's Parallel Data Lab, submitted Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? The paper crushes a number of (what we now know to be) myths about disks such as vendor MTBF validity, 'consumer' vs. 'enterprise' drive reliability (spoiler: no difference), and RAID 5 assumptions. StorageMojo has a good summary of the paper's key points."
Re:MTBF (Score:5, Informative)
Re:MTBF? RTFA. (Score:5, Informative)
Well, the article actually says that drives don't have a spike of failures at the beginning. It also says failure rates increase with time. So you're right that MTBF shouldn't be taken for a single drive, since the failure rate at 5 years is going to be much higher than at one.
The other thing that the article claims is that the stated MTBF is simply just wrong. It mentioned a stated MTBF of 1,000,000 hours, and an observed MTBF of 300,000 hours. That's pretty bad. It's also quite interesting that the "enterprise" level drives aren't any better than the consumer level drives.
Every single solid state drive will fail too... (Score:3, Informative)
If something has an MTBF of 1 million hours (that's 114 years or so), then you'll be a long time dead before it fails.
At this stage, the only reasonable non-volatile solid state alternative is NAND flash which costs approx 2 cents per MByte ($20/Gbyte) and dropping. NAND flash has far slower transfer speeds than HDD, but is far smaller, uses less power and is mechanically robust. NAND flash typically has a lifetime of 100k erasure cycles and needs special file systems to get robustness and long life.
This paper and the Google paper are complementary (Score:5, Informative)
The Google paper shows that relatively high temperatures and high usage rates don't affect disk life.
The current paper shows that interface (SCSI, FC vs ATA) had no effect either. The Google paper shows
a significant infant mortality that the CMU paper didn't, and the Google paper shows some years of flat
reliability where the current paper shows decreasing reliability from year one.
The both show that the failure rate is far higher than the manufacturers specify, which shouldn't come
as a surprise to anybody with a few hundred disks.
I'm particularly pleased to see a stake driven through the heart of "SCSI disks are more reliable."
Manufacturers have been pushing that principle for years, saying that "oh, we bin-out the SCSI disks
after testing" or some other horseshit, but it's not true and it's never been true. The disks are
sometimes faster, but they're not "better".
Thad
Re:moving parts (Score:5, Informative)
They do have a limited read/write lifetime for each sector, BUT the controllers automatically distribute data over the least-used sectors (since there's no performance penalty to non-linear storage), and you wind up getting the maximum possible lifetime from well-built solid-state drives (assuming no other failures).
So in practice, the lifetime of modern solid state will be better than spinning disks as long as you aren't reading and writing every sector of the disk on a daily basis.
That's wrong (Score:3, Informative)
It helps, and distributing the data more helps more. Someone concerned about multi-drive failures can, for example, use a 3-way RAID 1 array, or a RAID 6 array (which can tolerate the loss of any 2 drives).
forget RAID? (Score:3, Informative)
The fact that another drive in an array is more likely to fail if one has already failed makes a lot of sense, but the conclusion to forget RAIDs doesn't. Arrays are normally composed of the same drive model, even the same manufacturing batch, and are in the same operating environment. If something is "wrong" with any of these three variables, and it causes a drive to fail, it's common sense the other drives have a good chance at following. I've seen real-world examples of this.
In my real-world situations, the RAID still did it's job, the drive was replaced, and nothing was lost, despite subsequent failure of other drives in the array. Sure you can get similar reliability at a lower price by replicating data, but I think that's always been understood as the case. Furthermore, as someone else in the forum mentioned, enterprise-class RAIDs are often used primarily for performance reasons. A modern hardware RAID controller (with a dedicated processor and ram) can create storage performance unattainable outside of a RAID.
Re:moving parts (Score:5, Informative)
Re:moving parts (Score:2, Informative)
Exponential with time (Score:3, Informative)
Waaaah! They cry, when I tell them there is no hope for the family photos, barring a media reclamation service == $$$
I tell everyone: "Assume your hard drive will fail at any moment, starting now! What is on your hard drive that you would be upset if you never saw it again?"
Re:Desktop vs Server usage. (Score:3, Informative)
- I never turn off the PC.
- The case has no cover.
Re:MTBF (Score:3, Informative)
Re:MTBF (Score:4, Informative)
Re:Desktop vs Server usage. (Score:5, Informative)
You worked at an unusual place!
I'm a Tech Support Engineer for a large storage system manufacturer and I can tell you that NONE of our customers replace disks before they fail unless our OS detects a "predictive failure" for the disk. Our customers are some of the biggest names in business from all over the planet.
Re:Amazing! (Score:1, Informative)
This is wrong! SOFTWARE raid is faster. Why? Consider:
- The CPUs one buys are usually the latest and greatest.
- A 1.6GHz Athlon XP can process raid5 data at >3GB/s. This is significantly greater than your bus speed.
- If you're waiting on a disk read, chances are, your CPU isn't doing much anyways. (That said, you need to do very little to process a disk read. It's the disk writes that require checksuming).
- A raid controller adds an extra step into the disk->cpu latency
- A raid card microprocessor is spec'ed at whatever rate is needed to max a bus, or, often, significantly less. This means that any processing needed will incur a higher latency than if the data were processed by the CPU.
Roughly, for the hardware solution, all advantages are:
- Data can be considered flushed once it reaches the raid card, not the disk due to battery backuped ram (only matters for ACID databases, for systems not on UPS, without redundant power supplies)
- Batch systems may see reduced CPU use. This highly depends on the device driver being well written.
- Bus usage will be divided by 3 for small (sub ((n-1)/2)*block size, where n is the number of disks in the raid) writes, due to not having to do a read and write to update the parity.
You'll note that all of these advantages are on writes! Also, the last advantage is less important than it may seem. Very few small random IO write bound loads exist. (eg. databases will try to rearrange data to make large linear writes, requiring a bus usage of n/(n-1) in the software case)
To reiterate, usually the issue with data access isn't bandwidth, but latency. A hardware solution will not decrees this, except under specialised loads.
Re:Every single solid state drive will fail too... (Score:3, Informative)
Re:MTBF (Score:4, Informative)
Now, you can have all sorts of distributions that you draw that mean from, but a mean is a mean.
Re:Infant Mortality and stuff (Score:5, Informative)
This study looks pretty realistic to me, in fact its better data than the Google paper's because they are looking at different usage scenarios. The study also jives with vendor's warranty periods -- right around the 3 year mark (end of warranty) failures start going up.
I take issue with your "real world vs. theory" argument version workstation disks and server disks as well, only because I have my own numbers. Based on numbers that my company gathers for its 50,000 workstations, the disk failure rate is around 1.9% annually. (Still alot of disks) There are exceptions -- those numbers are driven upward by one deployment of workstations from a vendor that had a 22% failure rate. (the PCs were replaced by the vendor) Server disks are in the same ballpark - slightly less that 2%.
Vendors provide more evidence of that fact. Many servers are being shipped with SATA disks, often the same as what you'll find in workstations. If SATA was less reliable, that would increase the vendor's support costs and they wouldn't ship them.
You're totally right about RAID-5... it can be a dangerous thing for an inept admin. Bad disks often come in batches, and bad controllers can ruin your day. A redundant array of bad data isn't very helpful ;)
Re:How much does handling matter? (Score:4, Informative)
here you go
http://hardware.slashdot.org/article.pl?sid=07/02
Re:Desktop vs Server usage. (Score:3, Informative)
1) Electricity consumption
2) Power cuts (unless you have a UPS and software for a clean shutdown installed, what happens if there is a power cut while you are away?).
3) Power fluctuations (my power supply blew dramatically after one a few months ago) and lightning.
4) Heat (in a hot climate)
Re:Amazing! (Score:3, Informative)
afaict Linux software raid is actually pretty good nowadays at least as long as you stick to the basic raid levels
beware of the very common fake hardware (e.g. really software but with some bios and driver magic to make the array bootable and generally behave like hardware raid from the users point of view) controllers. Theese often have far worse performance than linux software raid and many of them only support windows.
Re:Infant Mortality and stuff (Score:2, Informative)
Re:That's wrong (Score:3, Informative)
There are three real dangers with raid
The first is that arrays are typically built out of identical drives, usually drives from the same batch and then all the drives are run for the same time periods. This means that if there is a design or manufacturing fault that causes a failure peak at a certain number of operational hours there is a good chance that more than one drive in your array will fail at about the same time.
The second is that the drives in an array are typically in one machine, running off one power supply (or one pair of redundant power supplies) and connected to one controller. This means that faults with other hardware in the machine can destroy multiple hard drives at once.
The third is failure of the controller. In many cases the controller stores information on how the data is set up within its own non-volatile memory (some better controllers do store it on the disks themselves) while this doesn't destroy the actual data it can easilly put it beyond the ability of non-experts to reassemble the array in a way that gets the data back (and if they make a mistake they can easilly destory the data they were trying to recover). There is also the problem that getting a suitable replacement controller may be difficult.
Re:Amazing! (Score:3, Informative)
That is, unless you go for software RAID, which will put a hit on your processor.
This myth needs to die. No remotely modern processor takes a meaningful performance hit from the processing overhead of RAID.
However, I think if you're going to make the investment to go with RAID 5, then buying a proper hardware controller won't add a significant amount to the cost of your set up.
Decent RAID5-capable controllers are hundreds of dollars. Software RAID is free and - in most cases - faster, more flexible and more reliable.
Re:Amazing! (Score:3, Informative)
Uh sorta. Depends on the raid type. Striped will be faster, mirrored will be about as fast, raid 5 is gonna be the slowest, even in hardware.
Compared to a single disk, RAID5 is still going to be faster (except perhaps for the odd corner-case here and there).
Also, in many cases, software RAID5 is faster that hardware RAID5.
Re:No "infant mortality" effect? (Score:3, Informative)
It's true that you should never buy anything for the illusion of reliability, but the article does not claim RAID is not a good way to get reliability.
First, let's look at the common mistake when people think about RAID: "If the probability of a drive failure is X, then the probability of two drives in a RAID volume failing is X*X, which is much smaller". That's nonsense, as the article demonstrates - the probability is only X*X if the events are independent, which they are clearly not.
But the idea was nonsense even before that. The statement is taking the wrong attitude to the problem - it is considering the probability of data loss at *one point in time*. That's not actually what you care about - if your server dies on Tuesday, it is no comfort to you that it did not die on Monday. Here is a more sensible way to look at what is going on (ignoring backups for the moment):
Every drive is going to fail, typically within the first ten years of its life. So if you have a non-RAID system, the probability of data loss is 100% - certain. Really. Without RAID, sooner or later, you are going to lose that volume. What RAID gives you is a moderate chance of getting through the inevitable drive failures without losing the volume, and that's a chance that you never had at all without RAID. Different configurations can modify how large that chance is, but the essential feature of RAID is that you get the chance.
So what do backups get you? It's basically the same thing, except that you've got to rebuild the server. So if you just have backups and no RAID, it is a certainty that sooner or later your server is going to have significant amounts of downtime while it's being rebuilt from the backup. If downtime bothers you, you need RAID, period. Exactly what kind of RAID depends on what chance you want to take (standard risk management calculation), but there's just no contest between "certain failure" and "chance of avoiding failure" - even a 10% chance of surviving a disk failure is infinitely better than no chance (and the actual figure should be much better than that).
Lastly, what happens if you have RAID and no backups? It should be apparent that you get the same scenario as RAID with backups, only with a higher chance of failure. So there's no fundamental reason not to do that - line up the figures along with RAID+backup solutions in your risk management analysis, and pick the cheapest option for the level of risk you (or your insurance company) are willing to accept.
The impact of this study is a nice improvement in the accuracy of that analysis. Neither more nor less. If you're running large servers, this would be a good time to pull out those numbers and take another look at them (if you don't have those numbers on file, this study is not for you).
Re:Actually, mostly it DOESN'T contradict (Score:2, Informative)
This is known as the Veblen Effect [wikipedia.org] based on work by Thorstein Veblen [wikipedia.org].
Re:moving parts (Score:3, Informative)