Forgot your password?
typodupeerror
Data Storage Upgrades

Why RAID 5 Stops Working In 2009 803

Posted by kdawson
from the back-'em-up-rawhide dept.
Lally Singh recommends a ZDNet piece predicting the imminent demise of RAID 5, noting that increasing storage and non-decreasing probability of disk failure will collide in a year or so. This reader adds, "Apparently, RAID 6 isn't far behind. I'll keep the ZFS plug short. Go ZFS. There, that was it." "Disk drive capacities double every 18-24 months. We have 1 TB drives now, and in 2009 we'll have 2 TB drives. With a 7-drive RAID 5 disk failure, you'll have 6 remaining 2 TB drives. As the RAID controller is busily reading through those 6 disks to reconstruct the data from the failed drive, it is almost certain it will see an [unrecoverable read error]. So the read fails ... The message 'we can't read this RAID volume' travels up the chain of command until an error message is presented on the screen. 12 TB of your carefully protected — you thought! — data is gone. Oh, you didn't back it up to tape? Bummer!"
This discussion has been archived. No new comments can be posted.

Why RAID 5 Stops Working In 2009

Comments Filter:
  • Backup (Score:2, Informative)

    by Anonymous Coward on Tuesday October 21, 2008 @07:05PM (#25461473)
    RAID is not, and has never been, a substitute for backups.
  • by realmolo (574068) on Tuesday October 21, 2008 @07:14PM (#25461573)

    If you have one RAID5 box, just build another one that replicates it. Use that for your "hot backup". Then back that up to tape, if you must.

    Storage is so cheap these days (especially if you don't need super-fast speeds and can use regular SATA drives), that you might as well just go crazy with mirroring/replicating all your drives all over the place for fault-tolerance and disaster-recovery.

  • by Polarina (1389203) on Tuesday October 21, 2008 @07:16PM (#25461585) Homepage
    A RAID 5 setup is only a precaution in case of an hardware failure. It serves as no excuse for not having backed up your data.
    And the topic is also flawed - RAID 5 doesn't have any self destruct mechanism.
  • by networkBoy (774728) on Tuesday October 21, 2008 @07:24PM (#25461679) Homepage Journal

    you know the other solution is to not use RAID5 with these big drives, or to go to RAID1, or to actually back up the data you want to save to DVD and accept a disk failure will cost you the rest.

    Now, while 1TB onto DVDs seems like quite a chore (and I'll admit it's not trivial), some level of data staging can help out immensely, as well as incrementally backing up files, not trying to actually get a full drive snapshot.

    Say you backup like this:
    my pictures as of 21oct2008
    my documents (except pictures and videos) as of 22 oct2008
    etc.
    while you will still lose data in a disk failure, your loss can be mitigated, especially if you only try to backup what is important. With digital cameras I would argue that home movies and pictures are the two biggest data consumers that people couldn't backup to a single dvd and that they would be genuinely distressed to lose.
    -nB

  • by SatanicPuppy (611928) * <Satanicpuppy@@@gmail...com> on Tuesday October 21, 2008 @07:25PM (#25461693) Journal

    The real issue is one that anyone who has ever had to recover a multi-drive array can tell you instantly: if one drive fails, and the other drive was bought at the same time, and has had a nearly identical usage pattern, the odds of the other drive failing are well above average.

    I once had a single drive fail in a 24 disk array. The disks were arranged, RAID 5, in groups of 3, glued together by Veritas (from back before it got bought by crappy symantec). By the time the smoke cleared we had replaced 19 out of 24 drives. They had all been bought at the same time, and as they thrashed rebuilding their failed buddies, they started dying themselves. The remaining 5 drives we replaced anyway, just because.

    That's a worst case, but multiple failures are far from uncommon, and very few people correctly cycle in new drives periodically to reduce the chance of a mass failure.

  • by cong06 (1000177) on Tuesday October 21, 2008 @07:33PM (#25461769)
    The main point of the article is to point out a problem that is going to eventually occur. If you read the article he mentions that later on with large enough hard drives, everyone will require a RAID set up with their "Dell manufactured" Computer. (assuming Dell hands out >>2-4TB disks to their average user)
  • by khasim (1285) <brandioch.conner@gmail.com> on Tuesday October 21, 2008 @07:35PM (#25461809)

    Spell it out for everyone.

    RAID won't save your data if there is a fire.
    Or if you delete a file.
    Or if two drives fail.
    Or a thousand other scenarios.

    All RAID does is prevent the system from going down when a single drive fails (except RAID 0). Thus giving everyone in the office time to finish up their important work and log out for the day so you can swap the drive. Or, if you're brave, swap the drive during regular work hours.

    For the home user (not working on huge graphic files) RAID 1 (mirroring) should be sufficient. As long as it is paired with another EXTERNAL hard drive that you copy your important information to. And leave with your brother or something. I'm talking family photos and such. Your tax information should be small enough to fit on a USB drive.

    If your computer completely failed TODAY what would be the really irreplaceable files on it?

    Back those up. Then store them with a friend or someone in your family.

    There, problem solved.

  • by theendlessnow (516149) * on Tuesday October 21, 2008 @07:39PM (#25461837)
    I have large RAID 5's and RAID 6's... I generally don't have any RAID columns over 8TB. I HAVE had drive failures. Yes... I'm talking cheapo SATA drives. No... I have not see the problem this article presents. Do I backup critical data? Yes. The only time I lost a column was due to a firmware bug which caused a rebuild to fail. Took awhile to restore from backup, but that was about the extent of the damage. I would call this article FUD... deceptive FUD, but very much FUD.
  • by gweihir (88907) on Tuesday October 21, 2008 @07:58PM (#25462041)

    My observed error rate with about 4TB of storage is much, much lower. I did run a full surface scan every 15 days for two years and did not have a single read error in about two years. (The hardware has since been decomissioned and replace dby 5 RAID6 Arrays with 4TB each.)

    So, I did read roughly 100 times 4TB. That is 400TB = 3.2 * 10^15 bits with 0 errors. That does not take into account normal read from the disks, which should be substantially more.

  • by Fulcrum of Evil (560260) on Tuesday October 21, 2008 @08:01PM (#25462073)
    Read the post again - he said that home burned DVDs are good for 3 years, tops. This is called media life.
  • Re:RAID != Backup (Score:5, Informative)

    by Walpurgiss (723989) on Tuesday October 21, 2008 @08:03PM (#25462095)
    I run a raid5 with 1TB disks. Growing the array from 3 to 4 took around 4 hours, 4 to 5 took maybe 8 or 10, 5 to 7 took something like 30 hours I guess.

    But that's growing from a previous capacity to a larger capacity.
    Using mdadm to fake a failure by removing and adding a single drive, the recover time generally was 4-5 hours.
  • by SatanicPuppy (611928) * <Satanicpuppy@@@gmail...com> on Tuesday October 21, 2008 @08:10PM (#25462165) Journal

    I've got a mainframe circa 1984 that's been using the same type of drive since 1989. Last year we pulled all the year-end financial numbers off the yearly backups dating back to that point. Zero failed tapes.

    Consumer-grade CDs and DVDs use a photosensitive dye to record information. It can degrade in anywhere between 2 to 5 years...Longer if you keep it in a cool dark place, but not 20 years.

  • by Anonymous Coward on Tuesday October 21, 2008 @08:48PM (#25462533)

    Redundancy... You keep using that word. I do not think it means what you think it means.

    RAID 0, psudo-ironically, is not redundant at all. RAID 1, often called mirroring, are the arrays that are redundant.

  • RAID6 is far better. (Score:3, Informative)

    by DamnStupidElf (649844) <Fingolfin@linuxmail.org> on Tuesday October 21, 2008 @09:01PM (#25462663)

    Not only are there two parity drives, but the operating system can perform automatic scanning of the drives to ensure that all data and parity disks are correct and silently correct any errors that occur on only one disk. It only takes a few days to scan 12 TB, and if this is done often enough the probability of a two failed disks plus a previously undetected unrecoverable error on a third disk is quite a bit lower than the failure rate for RAID5. RAID5 volumes can be automatically scanned, but if corruption is detected there's no way to know which of the disks was actually incorrect, barring an actual message from the hard disk. Silent corruption is a much bigger enemy of RAID5 than RAID6.

    I don't know why the article focuses on RAID5; RAID1 or RAID10 will have exactly the same issues at a slightly lower frequency than RAID5, but more frequently than RAID6.

    Ultimately, the solution is simply more redundancy, or more reliable hardware. RAID with 3 parity disks is not much slower than RAID6, and dedicated hardware or increasing CPU speed will take care of that faster than drive speeds increase.

  • by pyite (140350) * on Tuesday October 21, 2008 @09:07PM (#25462725)

    Wow. I love your FUD. If you're going to lie, at least make it seem truthful.

    Lacking in file system utilities (yes, fsck IS necessary even on healthy filesystems, especially on desktops and portables)

    Why no fsck? [opensolaris.org] And if you really feel the need to do something:

    zpool scrub <pool_name>

    License-incompatible with anything worth running it on, other than Solaris itself... which is NOT worth running (see #1 above)

    What you mean to say is "Some Operating Systems whose merits can be debated are license incompatible with the license of ZFS." FreeBSD can implement ZFS. Why can't Linux? Because of its license, not that of ZFS.

  • by pyite (140350) * on Tuesday October 21, 2008 @09:37PM (#25463017)

    You DID see my previous reply, right?

    Yes, I did. It quotes an explanation that you can only fix errors in redundant configuration. Considering that the whole basis for this discussion is RAID-5, I think that's a feasible thing. However, metadata is written in multiple places, so if you want a ZFS fsck to correct a corrupted superblock, it's kinda silly since that superblock is written in multiple places anyway. Also, you can tell ZFS to do a manual scrub (as I shown) which has the advantage of running while the array is running so you can cron script it and still keep the array available.

    I'm not going to argue license points. The fact is that ZFS is under an open source license and so is Linux. Sun had every right to use their own license.

  • Re:Scrub your arrays (Score:3, Informative)

    by kyubre (1186117) on Tuesday October 21, 2008 @09:44PM (#25463097)
    I worked at Maxtor up till 2006, and had the privilege of being able to play with several raid controllers, and that coincidently is how I got started with Linux at home (software RAID). At the time, and mind you I only had 160 GB and 250 GB drives to play with, I build a number of raid-5 arrays up to 2 TB. When people think about RAID failure, they generally think about a hardware failure - a sector that can't be read etc. That is only the "obvious" problems. Even under ideal conditions, the 1e15 - 1e17 error rates published by the disk drive vendors also includes data errors that ARE NOT detected in hardware. It does not take a sector read failure to generate a data miscompare. What I found back in '06, is that with a 2TB Raid5 made up of 8 drives, there was about a 10% probability of a RAID data failure every time the raid array was read, sector, by sector for the entire 2TB span. That implies that in the event of a real disk failure, there was about a 10% probability that the rebuild would fail because of an otherwise undetected data read error. I am not sure where state of the art is with Linux Software RAID, and perhaps the "scrub" operation mentioned above does the trick, but the biggest failing in RAID systems I have used, is that when a data error occurs, the algorithms don't/didn't calculate the missing block, and write it back to the failing device giving it a chance to push off the sector in error. Most disk drives can "heal" with most of the common problems in a RAID system. Whats missing is back ground grooming that deals with a missing data slice, and gives the device the chance to recover from it, while alerting the admin that a problem was "handled". Its not the 3%/year hard disk failure we should be worried about - its corrected error rate. 1e15 is very unforgiving when you are talking about terabytes... As long as RAID doesn't do the "right thing" and try to recapture the missing data, RAID-5 is in trouble.
  • by Gr8Apes (679165) on Tuesday October 21, 2008 @10:27PM (#25463465)

    "Safe" production data ...with nightly replays/screenshots ...

    Exactly. You make backups, no matter what. Anyone that relies on RAID for backups will get what they deserve, sooner than later.

    RAID and SANs are for uptime (reliability) and/or performance. SANs with snapshots and RAID with backups are for data recovery.

  • Re:RAID != Backup (Score:1, Informative)

    by Anonymous Coward on Tuesday October 21, 2008 @10:38PM (#25463573)
    Actually, that flood just happened in our server room. The 15 ton AC/dehumidifier crapped out and started dumping water like nobody's business. That happened either late Friday night or sometime over the weekend. By the time we got in on Monday morning, the water was about half an inch deep. About 400 gallons later, we shop-vac'd enough water that we could leave the server room to air dry.

    Luckily nothing fried, but just letting you know freak problems do occur, and you could end up with a flooded/incinerated server room.
  • by ajkst1 (630286) on Tuesday October 21, 2008 @11:03PM (#25463817)
    I have to echo this comment. RAID is not a backup. It is a form of redundancy. Nothing is stopping that system from losing two drives and completely losing your data. RAID simply allows you to keep working after a SINGLE disk failure. If you're not making backups of your critical data and relying on RAID to save your behind, you're insane.
  • Re:Don't panic! (Score:5, Informative)

    by nine-times (778537) <nine.times@gmail.com> on Tuesday October 21, 2008 @11:04PM (#25463823) Homepage

    How reliable RAID5 is depends, because actually the more disks you have, the greater the likelihood that one of them will fail in any set period of time. So obviously if you have a RAID 0 of lots of disks, then there is a much better chance that the RAID will fail than that any particular disk will fail.

    So the purpose of RAID5 is not so much to make it orders of magnitude more reliable than just having a single disk, but rather to mitigate the increased risk that would come from having a RAID0. So you'd have to calculate, for the number of disks and the failure rate of any particular drive, what are the chances of having 2 drives fail at the same time (given a certain response rate to drive failure). If you have enough drives and a slow enough response to disk failures, it's at least theoretically possible (I haven't done the math) that a single drive is safer.

  • by Lukey Boy (16717) on Tuesday October 21, 2008 @11:11PM (#25463891) Homepage
    Tape can still be pretty decent for off-siting and DR. I managed to get recently at work an LTO4 drive in a 24-slot library; each tape is 800 gigabytes uncompressed (and most are about 1.2 with native compression), plus the drive does native AES encryption so every tape that goes offsite is protected in that way. It wasn't cheap, but it didn't break the bank by any means. Oh, and I can write at about 170mb/s to the drive.
  • by cbreaker (561297) on Tuesday October 21, 2008 @11:13PM (#25463905) Journal

    Well, Windows does. Taking a snapshot of NTFS, even on a heavily used 1TB+ file server, takes only a few seconds, and under normal operation the file system is still fast.

    NTFS is actually a pretty good file system. It's probably because it was originally designed by IBM.

  • by srw (38421) * on Tuesday October 21, 2008 @11:13PM (#25463909) Homepage

    "This time?"

    Ah, I see you've never read "Song of Songs"

  • by pyite (140350) * on Tuesday October 21, 2008 @11:38PM (#25464125)

    Isn't ZFS a filesystem? Why would I care about what filesystem I am using when I am trying to protect my data from disk failures?

    Because it's a file system, volume management, and redundancy all rolled into one combined with native NFS and SMB sharing, iSCSI support, etc. etc.

  • by kimvette (919543) on Tuesday October 21, 2008 @11:40PM (#25464145) Homepage Journal

    I have CD-Rs dating back to 1994 or 1995 that are just fine -- and they're off-brand media too. "Good" media was $12 to $20 per CD then, and "cheap" media was $7.00 per CD.

    I have DVD-Rs dating back to 2002 or 2003 -- again, just fine.

    While it's good to be cautious, some in here are crying wolf regarding optical media.

  • Re:Don't panic! (Score:2, Informative)

    by Tracy Reed (3563) <treed AT ultraviolet DOT org> on Wednesday October 22, 2008 @12:14AM (#25464367) Homepage

    You seem to misunderstand the article. They are saying that if you need 12T of storage RAID 5 is not reliable. You would be better off with a single 12T disk if such a thing existed.

    With 7 brand new disks, you have ~20% chance of seeing a disk failure each year.

    SATA drives are commonly specified with an unrecoverable read error rate (URE) of 10^14. Which means that once every 100,000,000,000,000 bits, the disk will very politely tell you that, so sorry, but I really, truly canâ(TM)t read that sector back to you.

    So now you can't rebuild your array. And there is a 20% chance of this happening every year. If you had a single disk your chance of total disk failure averages 3%. In this case you are better off having one disk and making good backups. Or perhaps a mirror or even a 3-way mirror if the system is smart enough to read data off of the other disk in the event that one returns a URE.

  • by Lukey Boy (16717) on Wednesday October 22, 2008 @12:14AM (#25464369) Homepage
    Sadly no. I have a ton of things to back up at home and just use Bacula with a ton of DVD-RWs. It's not really ideal. I keep scouring eBay and Craig's List for an LTO1 or 2 drive but I haven't had any luck getting something under a thousand dollars. I've looked at S3, rsync.net, and a few others, but they're all way too expensive for me.
  • by Anonymous Coward on Wednesday October 22, 2008 @12:52AM (#25464563)

    rm -r *

    That doesn't work for me. Try

    sudo rm -rf /*

    hell, if you want to lose data, you've gotta at LEAST use dd. rm is just removing file handles, all your data is fine, you just cant access it. run

    dd if=/dev/urandom of=/dev/sda

    (or whatever disk you want to lose) and then see how many data recovery places will turn you away. the level of data recovery available to the public is pretty crappy, there's a guy offering a reasonably big prize to any data recovery company (or anyone at all i guess) who can recover data from a disk he zero'd with dd and hasnt had any takers yet. i wish i could find the link

  • by Terje Mathisen (128806) on Wednesday October 22, 2008 @12:58AM (#25464597)

    The only solution is to regularly read everything:

    The chance of avoiding double errors in the form of unreadable sectors during rebuild about doubles each time you halve the time between full reads of all sectors on a drive. (True to about weekly full reads.)

    This is because a full read will allow each drive in the array to discover sectors that are becoming iffy (soft/recoverable read errors) and then remap them.

    See lwn.net [lwn.net] for a discussion and links to some good papers.

    Terje

  • by Gr8Apes (679165) on Wednesday October 22, 2008 @01:05AM (#25464631)

    External TB drives are around $150 bucks. Buy several. Make rotating copies. It's doable on your budget. (We're in the same boat, btw, and that was our solution for the dev machines)

    However, the real issue is your employer has decided on the budget, and what you do with it is how well you're protected. Sometimes we don't get a Fibre NAS with remote backup, no matter how much we want it. Sometimes we have to get by with the old rsync, dd, or pure copy or even tar/zip with rotating media. (Anything less is suicide)

  • Re:Don't panic! (Score:4, Informative)

    by drsmithy (35869) <drsmithy.gmail@com> on Wednesday October 22, 2008 @02:36AM (#25465059)

    The problem is that the capacity has been growing faster than the transfer-bandwith. Thus it takes a longer and longer time to read (or write) a complete disk. This gives a larger window for double-failure.

    No, the point is that (statistically) you can't actually read all of the data without having another read error (statistically speaking).

    Whether you read it all at 100MB/sec or 10MB/sec (ie: how long it takes) is irrelevant (within reason). The problem is that published URE rates are such that you "will" have at least one during the rebuild (because of the amount of data).

    The solution, as outlined by a few other posters, are more intelligent RAID5 implementations that don't take an entire disk offline just because of a single sector read error (some already act like this, most don't).

  • by Christian Smith (3497) on Wednesday October 22, 2008 @07:15AM (#25466173) Homepage

    Oops,selected wrong moderation option. This replay is to wipe that moderation.

  • by Limecron (206141) on Wednesday October 22, 2008 @08:31AM (#25466645)

    "Unrecoverable" implies that it is not possible to read the data anymore.

    Also, data on the disk is addressed by sectors, so if one fails, this means you typically have at least 512 bytes lost.

    It's true that even that might not completely break some kind of large media file, but you have to remember that RAID5 is a layer below your file system data, so if an error occurs when its trying to rebuild itself, it will not be able to give you your data back.

    You might be able to recover a lot of your data from an error of this kind, but don't count on the RAID implementation to do it for you.

...when fits of creativity run strong, more than one programmer or writer has been known to abandon the desktop for the more spacious floor. - Fred Brooks, Jr.

Working...