Why RAID 5 Stops Working In 2009 803
Lally Singh recommends a ZDNet piece predicting the imminent demise of RAID 5, noting that increasing storage and non-decreasing probability of disk failure will collide in a year or so. This reader adds, "Apparently, RAID 6 isn't far behind. I'll keep the ZFS plug short. Go ZFS. There, that was it." "Disk drive capacities double every 18-24 months. We have 1 TB drives now, and in 2009 we'll have 2 TB drives. With a 7-drive RAID 5 disk failure, you'll have 6 remaining 2 TB drives. As the RAID controller is busily reading through those 6 disks to reconstruct the data from the failed drive, it is almost certain it will see an [unrecoverable read error]. So the read fails ... The message 'we can't read this RAID volume' travels up the chain of command until an error message is presented on the screen. 12 TB of your carefully protected — you thought! — data is gone. Oh, you didn't back it up to tape? Bummer!"
Backup (Score:2, Informative)
Just double-up on everythign (Score:4, Informative)
If you have one RAID5 box, just build another one that replicates it. Use that for your "hot backup". Then back that up to tape, if you must.
Storage is so cheap these days (especially if you don't need super-fast speeds and can use regular SATA drives), that you might as well just go crazy with mirroring/replicating all your drives all over the place for fault-tolerance and disaster-recovery.
You're missing the point. (Score:2, Informative)
And the topic is also flawed - RAID 5 doesn't have any self destruct mechanism.
Re:Carefully protected? (Score:4, Informative)
you know the other solution is to not use RAID5 with these big drives, or to go to RAID1, or to actually back up the data you want to save to DVD and accept a disk failure will cost you the rest.
Now, while 1TB onto DVDs seems like quite a chore (and I'll admit it's not trivial), some level of data staging can help out immensely, as well as incrementally backing up files, not trying to actually get a full drive snapshot.
Say you backup like this:
my pictures as of 21oct2008
my documents (except pictures and videos) as of 22 oct2008
etc.
while you will still lose data in a disk failure, your loss can be mitigated, especially if you only try to backup what is important. With digital cameras I would argue that home movies and pictures are the two biggest data consumers that people couldn't backup to a single dvd and that they would be genuinely distressed to lose.
-nB
Re:Dont worry too much (Score:5, Informative)
The real issue is one that anyone who has ever had to recover a multi-drive array can tell you instantly: if one drive fails, and the other drive was bought at the same time, and has had a nearly identical usage pattern, the odds of the other drive failing are well above average.
I once had a single drive fail in a 24 disk array. The disks were arranged, RAID 5, in groups of 3, glued together by Veritas (from back before it got bought by crappy symantec). By the time the smoke cleared we had replaced 19 out of 24 drives. They had all been bought at the same time, and as they thrashed rebuilding their failed buddies, they started dying themselves. The remaining 5 drives we replaced anyway, just because.
That's a worst case, but multiple failures are far from uncommon, and very few people correctly cycle in new drives periodically to reduce the chance of a mass failure.
Re:7 2TB Disks in RAID 5???????? (Score:2, Informative)
RAID is about avoiding PRODUCTION downtime. (Score:3, Informative)
Spell it out for everyone.
RAID won't save your data if there is a fire.
Or if you delete a file.
Or if two drives fail.
Or a thousand other scenarios.
All RAID does is prevent the system from going down when a single drive fails (except RAID 0). Thus giving everyone in the office time to finish up their important work and log out for the day so you can swap the drive. Or, if you're brave, swap the drive during regular work hours.
For the home user (not working on huge graphic files) RAID 1 (mirroring) should be sufficient. As long as it is paired with another EXTERNAL hard drive that you copy your important information to. And leave with your brother or something. I'm talking family photos and such. Your tax information should be small enough to fit on a USB drive.
If your computer completely failed TODAY what would be the really irreplaceable files on it?
Back those up. Then store them with a friend or someone in your family.
There, problem solved.
Re:Testable assertion (Score:4, Informative)
1 in 10^14 bit is not what I observe (Score:5, Informative)
My observed error rate with about 4TB of storage is much, much lower. I did run a full surface scan every 15 days for two years and did not have a single read error in about two years. (The hardware has since been decomissioned and replace dby 5 RAID6 Arrays with 4TB each.)
So, I did read roughly 100 times 4TB. That is 400TB = 3.2 * 10^15 bits with 0 errors. That does not take into account normal read from the disks, which should be substantially more.
Re:Carefully protected? (Score:3, Informative)
Re:RAID != Backup (Score:5, Informative)
But that's growing from a previous capacity to a larger capacity.
Using mdadm to fake a failure by removing and adding a single drive, the recover time generally was 4-5 hours.
Re:Carefully protected? (Score:5, Informative)
I've got a mainframe circa 1984 that's been using the same type of drive since 1989. Last year we pulled all the year-end financial numbers off the yearly backups dating back to that point. Zero failed tapes.
Consumer-grade CDs and DVDs use a photosensitive dye to record information. It can degrade in anywhere between 2 to 5 years...Longer if you keep it in a cool dark place, but not 20 years.
Re:RAID doesn't protect against your worst enemy (Score:4, Informative)
Redundancy... You keep using that word. I do not think it means what you think it means.
RAID 0, psudo-ironically, is not redundant at all. RAID 1, often called mirroring, are the arrays that are redundant.
RAID6 is far better. (Score:3, Informative)
Not only are there two parity drives, but the operating system can perform automatic scanning of the drives to ensure that all data and parity disks are correct and silently correct any errors that occur on only one disk. It only takes a few days to scan 12 TB, and if this is done often enough the probability of a two failed disks plus a previously undetected unrecoverable error on a third disk is quite a bit lower than the failure rate for RAID5. RAID5 volumes can be automatically scanned, but if corruption is detected there's no way to know which of the disks was actually incorrect, barring an actual message from the hard disk. Silent corruption is a much bigger enemy of RAID5 than RAID6.
I don't know why the article focuses on RAID5; RAID1 or RAID10 will have exactly the same issues at a slightly lower frequency than RAID5, but more frequently than RAID6.
Ultimately, the solution is simply more redundancy, or more reliable hardware. RAID with 3 parity disks is not much slower than RAID6, and dedicated hardware or increasing CPU speed will take care of that faster than drive speeds increase.
Re:Can I tell you where to insert your plug? (Score:5, Informative)
Wow. I love your FUD. If you're going to lie, at least make it seem truthful.
Lacking in file system utilities (yes, fsck IS necessary even on healthy filesystems, especially on desktops and portables)
Why no fsck? [opensolaris.org] And if you really feel the need to do something:
License-incompatible with anything worth running it on, other than Solaris itself... which is NOT worth running (see #1 above)
What you mean to say is "Some Operating Systems whose merits can be debated are license incompatible with the license of ZFS." FreeBSD can implement ZFS. Why can't Linux? Because of its license, not that of ZFS.
Re:Can I tell you where to insert your plug? (Score:3, Informative)
You DID see my previous reply, right?
Yes, I did. It quotes an explanation that you can only fix errors in redundant configuration. Considering that the whole basis for this discussion is RAID-5, I think that's a feasible thing. However, metadata is written in multiple places, so if you want a ZFS fsck to correct a corrupted superblock, it's kinda silly since that superblock is written in multiple places anyway. Also, you can tell ZFS to do a manual scrub (as I shown) which has the advantage of running while the array is running so you can cron script it and still keep the array available.
I'm not going to argue license points. The fact is that ZFS is under an open source license and so is Linux. Sun had every right to use their own license.
Re:Scrub your arrays (Score:3, Informative)
Re:Carefully protected? (Score:4, Informative)
"Safe" production data ...with nightly replays/screenshots ...
Exactly. You make backups, no matter what. Anyone that relies on RAID for backups will get what they deserve, sooner than later.
RAID and SANs are for uptime (reliability) and/or performance. SANs with snapshots and RAID with backups are for data recovery.
Re:RAID != Backup (Score:1, Informative)
Luckily nothing fried, but just letting you know freak problems do occur, and you could end up with a flooded/incinerated server room.
Re:Carefully protected? (Score:5, Informative)
Re:Don't panic! (Score:5, Informative)
How reliable RAID5 is depends, because actually the more disks you have, the greater the likelihood that one of them will fail in any set period of time. So obviously if you have a RAID 0 of lots of disks, then there is a much better chance that the RAID will fail than that any particular disk will fail.
So the purpose of RAID5 is not so much to make it orders of magnitude more reliable than just having a single disk, but rather to mitigate the increased risk that would come from having a RAID0. So you'd have to calculate, for the number of disks and the failure rate of any particular drive, what are the chances of having 2 drives fail at the same time (given a certain response rate to drive failure). If you have enough drives and a slow enough response to disk failures, it's at least theoretically possible (I haven't done the math) that a single drive is safer.
Re:Carefully protected? (Score:3, Informative)
Re:RAID doesn't protect against your worst enemy (Score:5, Informative)
Well, Windows does. Taking a snapshot of NTFS, even on a heavily used 1TB+ file server, takes only a few seconds, and under normal operation the file system is still fast.
NTFS is actually a pretty good file system. It's probably because it was originally designed by IBM.
Re:RAID doesn't protect against your worst enemy (Score:4, Informative)
"This time?"
Ah, I see you've never read "Song of Songs"
Re:Ok, I'll take the ZFS bait (Score:3, Informative)
Isn't ZFS a filesystem? Why would I care about what filesystem I am using when I am trying to protect my data from disk failures?
Because it's a file system, volume management, and redundancy all rolled into one combined with native NFS and SMB sharing, iSCSI support, etc. etc.
Re:Carefully protected? (Score:5, Informative)
I have CD-Rs dating back to 1994 or 1995 that are just fine -- and they're off-brand media too. "Good" media was $12 to $20 per CD then, and "cheap" media was $7.00 per CD.
I have DVD-Rs dating back to 2002 or 2003 -- again, just fine.
While it's good to be cautious, some in here are crying wolf regarding optical media.
Re:Don't panic! (Score:2, Informative)
You seem to misunderstand the article. They are saying that if you need 12T of storage RAID 5 is not reliable. You would be better off with a single 12T disk if such a thing existed.
With 7 brand new disks, you have ~20% chance of seeing a disk failure each year.
SATA drives are commonly specified with an unrecoverable read error rate (URE) of 10^14. Which means that once every 100,000,000,000,000 bits, the disk will very politely tell you that, so sorry, but I really, truly canâ(TM)t read that sector back to you.
So now you can't rebuild your array. And there is a 20% chance of this happening every year. If you had a single disk your chance of total disk failure averages 3%. In this case you are better off having one disk and making good backups. Or perhaps a mirror or even a 3-way mirror if the system is smart enough to read data off of the other disk in the event that one returns a URE.
Re:Carefully protected? (Score:4, Informative)
Re:RAID doesn't protect against your worst enemy (Score:3, Informative)
That doesn't work for me. Try
hell, if you want to lose data, you've gotta at LEAST use dd. rm is just removing file handles, all your data is fine, you just cant access it. run
(or whatever disk you want to lose) and then see how many data recovery places will turn you away. the level of data recovery available to the public is pretty crappy, there's a guy offering a reasonably big prize to any data recovery company (or anyone at all i guess) who can recover data from a disk he zero'd with dd and hasnt had any takers yet. i wish i could find the link
Read scrubbing is the key (Score:3, Informative)
The only solution is to regularly read everything:
The chance of avoiding double errors in the form of unreadable sectors during rebuild about doubles each time you halve the time between full reads of all sectors on a drive. (True to about weekly full reads.)
This is because a full read will allow each drive in the array to discover sectors that are becoming iffy (soft/recoverable read errors) and then remap them.
See lwn.net [lwn.net] for a discussion and links to some good papers.
Terje
Re:Carefully protected? (Score:4, Informative)
External TB drives are around $150 bucks. Buy several. Make rotating copies. It's doable on your budget. (We're in the same boat, btw, and that was our solution for the dev machines)
However, the real issue is your employer has decided on the budget, and what you do with it is how well you're protected. Sometimes we don't get a Fibre NAS with remote backup, no matter how much we want it. Sometimes we have to get by with the old rsync, dd, or pure copy or even tar/zip with rotating media. (Anything less is suicide)
Re:Don't panic! (Score:4, Informative)
The problem is that the capacity has been growing faster than the transfer-bandwith. Thus it takes a longer and longer time to read (or write) a complete disk. This gives a larger window for double-failure.
No, the point is that (statistically) you can't actually read all of the data without having another read error (statistically speaking).
Whether you read it all at 100MB/sec or 10MB/sec (ie: how long it takes) is irrelevant (within reason). The problem is that published URE rates are such that you "will" have at least one during the rebuild (because of the amount of data).
The solution, as outlined by a few other posters, are more intelligent RAID5 implementations that don't take an entire disk offline just because of a single sector read error (some already act like this, most don't).
Ignore: Re:Don't panic! (Score:2, Informative)
Oops,selected wrong moderation option. This replay is to wipe that moderation.
Re:Carefully protected? (Score:4, Informative)
"Unrecoverable" implies that it is not possible to read the data anymore.
Also, data on the disk is addressed by sectors, so if one fails, this means you typically have at least 512 bytes lost.
It's true that even that might not completely break some kind of large media file, but you have to remember that RAID5 is a layer below your file system data, so if an error occurs when its trying to rebuild itself, it will not be able to give you your data back.
You might be able to recover a lot of your data from an error of this kind, but don't count on the RAID implementation to do it for you.