Forgot your password?
typodupeerror
Data Storage Upgrades

Why RAID 5 Stops Working In 2009 803

Posted by kdawson
from the back-'em-up-rawhide dept.
Lally Singh recommends a ZDNet piece predicting the imminent demise of RAID 5, noting that increasing storage and non-decreasing probability of disk failure will collide in a year or so. This reader adds, "Apparently, RAID 6 isn't far behind. I'll keep the ZFS plug short. Go ZFS. There, that was it." "Disk drive capacities double every 18-24 months. We have 1 TB drives now, and in 2009 we'll have 2 TB drives. With a 7-drive RAID 5 disk failure, you'll have 6 remaining 2 TB drives. As the RAID controller is busily reading through those 6 disks to reconstruct the data from the failed drive, it is almost certain it will see an [unrecoverable read error]. So the read fails ... The message 'we can't read this RAID volume' travels up the chain of command until an error message is presented on the screen. 12 TB of your carefully protected — you thought! — data is gone. Oh, you didn't back it up to tape? Bummer!"
This discussion has been archived. No new comments can be posted.

Why RAID 5 Stops Working In 2009

Comments Filter:
  • by rhathar (1247530) on Tuesday October 21, 2008 @07:09PM (#25461507) Homepage
    "Safe" production data should be in a SAN environment anyways. RAID 5 on top of RAID 10 with nightly replays/screenshots and multi-tiered read/writes over an array of disks.
  • by mschuyler (197441) on Tuesday October 21, 2008 @07:14PM (#25461561) Homepage Journal

    That's what RAID stands for. It's a nice idea in theory, as long as the disks remain cheap, but I've never trusted them to work properly and had more than one break on me. "All you have to do is unplug the bad disk, plug in a good one in its place, and in a few minutes all will be hunky dory." Bzzt. Wrong. Thanks for playing.

    Backup every day to tape, to another disk entirely on a diffrent machine, to R/W DVD, twice a day if you have to, or all of the above--anywhere else but the machine itself. RAID: the accident waiting to happen. Yeah, I'm paranoid. It comes from experience.

  • by networkBoy (774728) on Tuesday October 21, 2008 @07:17PM (#25461589) Homepage Journal

    True.
    Also FWIW I only run RAID 1 and JBOD.
    For things that must be on-line, or are destined for JBOD but not yet archived to backup media, they are located on one of the RAID volumes. For everything else it's off to JBOD, where things are better than RAID5

    Why?

    I have 6 TB of JBOD storage and 600(2x300 volumes) GB of RAID 1. If I striped the JBOD into 6TB (7 drives) and one drive failed all the near-line data would be virtually off-line (and certainly read-only) while the array re-built. With JBOD, should a disk fail, I pop in a replacement, grab the stack of DVDs from the local backup, and plug the data back in. Now all the other near-line is still available and honestly takes about the same amount of effort and time as re-building a stripe set w/ parity. Never mind that I've had a read error on rebuilds before and had to re-do the entire array from scratch anyway.

    While my system would not work in an environment where the files on the JBOD change often, they are basically .archive anyway, so handling them by way of staging on RAID1 pending copy to DVD and storing on JBOD works fine.

    Naturally this system also really gives an incentive to keep up on the backups, with no false sense of security of having files on a RAID5...
    -nB

  • Testable assertion (Score:4, Interesting)

    by merreborn (853723) on Tuesday October 21, 2008 @07:18PM (#25461613) Journal

    But even today a 7 drive RAID 5 with 1 TB disks has a 50% chance of a rebuild failure. RAID 5 is reaching the end of its useful life.

    This is trivially testable. Any slashdotters have experience rebuilding 7TB RAID 5 arrays?

    You'd think, if this were really an issue, we'd be hearing stories from the front lines of this happening with increasing frequency. Instead we have a blog post based entirely on theory, without a single real-world example for corroboration.

    What's more, who even uses RAID 5 anymore? I thought it was all RAID 10 and whatnot these days.

  • by Vellmont (569020) on Tuesday October 21, 2008 @07:25PM (#25461705)

    The whole argument boils down the published URE rate being both accurate, and a foregone conclusion. Will disk makers _really_ make drives that have a sector failure for every 2 terabytes, or will they improve whatever technology is causing these URE's to be much more rare? (if the rate was real in the first place).

  • by Anonymous Coward on Tuesday October 21, 2008 @07:30PM (#25461745)

    If I striped the JBOD into 6TB (7 drives) and one drive failed all the near-line data would be virtually off-line (and certainly read-only) while the array re-built.

    What kind of crappy raid array is that? Better raid arrays will model & predict performance under degraded conditions like failure & rebuilding. They certainly don't stop or go read-only during a rebuild.

    When tour groups of non-tech people come by the server room, I used to emphasize reliability by pulling a hard disk out of a running server, hand it to them, and then put it back in the server. The server doesn't skip a beat (and these were common off-the-shelf Dell rackmount servers costing $2,500 or so).

    Aside from automated alarms paging some IT people, no one would notice.

  • by petes_PoV (912422) on Tuesday October 21, 2008 @07:34PM (#25461777)
    The larger the drives, the longer it takes to resilver (rebuild the RAID) the array. During this time performance takes a real hit - no matter what the vendors tell you, it's unavoidable: you simply must copy all that data.

    In practice, this means that while your array is rebuilding, your performance SLAs go out of the window. If this is for an interactive server, such as a TP database or web service you end up with lots of complaints and a large backlog of work.

    The result is that as disks get bigger, the recovery takes longer. This is what make RAID less desirable, not the possibility of a subsequent failure - that can always be worked around.

  • by Hadlock (143607) on Tuesday October 21, 2008 @08:11PM (#25462171) Homepage Journal

    I can't vouch for DVD-R but I have el-cheapo store brand CD-Rs that I backed up my MP3 collection to 11 years ago and they work just fine. My solution is this:
     
    Back everything up that's not media (mp3/video) every 6 months to CD-R, and once a year, copy all my old data onto a new hard drive that's 20+% larger than the one I bought last year and unplug the old one. I have 11 old hard drives sitting in the closet should I ever need that data, and the likelihood of a hard drive failing in the first year (after the first 30 days) is phenomenally low. Any document that I CAN'T lose between now and the next CD-R backup goes on a thumb drive or it's own CD-R and/or email it to myself.

  • Re:RAID-10 (Score:2, Interesting)

    by anonobomber (889925) on Tuesday October 21, 2008 @08:13PM (#25462189)
    With RAID 10 you still can have 2 drives fail and lose all your data. Though if you're lucky you'll have the second failure on the same side of the mirrored portion in which case you'll still have your data.
  • by Free the Cowards (1280296) on Tuesday October 21, 2008 @08:21PM (#25462281)

    Modern drives make extensive use of error-correcting codes. It's not that expensive, space-wise, to have a code which can recover from problems to almost any desired degree of confidence. I'd be shocked of any hard drive manufacturer wasn't using an ECC that gave their devices a very near zero chance of any user experiencing a corrupted read for the entire lifetime of the drive.

  • by Bandman (86149) <bandmanNO@SPAMgmail.com> on Tuesday October 21, 2008 @08:26PM (#25462331) Homepage

    It really only deals with SATA drive (SAS probably has lower failure rates) and it only becomes a statistical issue with mammoth amounts of data (the amount quoted in the article is 1 data read error per 14TB)

  • Scrub your arrays (Score:5, Interesting)

    by macemoneta (154740) on Tuesday October 21, 2008 @08:32PM (#25462407) Homepage

    This is why you scrub your RAID arrays once a week. If you're using software RAID on Linux, for example:

    echo check > /sys/block/md0/md/sync_action

    The above will scrub array md0 and initiate sector reallocation if needed. You do this while you have redundancy so the bad data can be recovered. Over time, weak sectors get reallocated from the spare bands, and when you do have a failure the probability of a secondary failure is very low over the interval needed for drive replacement.

    Most non-crap hardware controllers also provide this function. Read the documentation.

  • by mlts (1038732) * on Tuesday October 21, 2008 @08:44PM (#25462497)

    I just wish all the density improvements that hard disks get would propagate to tape. Tape used to be a decent backup mechanism, matching hard disk capacities, but in recent time, tape drives that have the ability to back up a modern hard disk are priced well out of reach for most home users. Pretty much, you are looking at several thousand as your ticket of entry for the mechanism, not to mention the card and a dedicated computer because tape drives have to run at full speed, or they get "shoe-shining" errors, similar to buffer underruns in a CD burn, where the drive has to stop, back up, write the data again and continue on, shortening tape life.

    I'd like to see some media company make a tape drive that has a decently sized RAM buffer (1-2GB), USB 2, USB 3, or perhaps eSATA for an interface port, and bundled with some decent backup software that offers AES encryption (Backup Exec, BRU, or Retrospect are good utilities that all have stood the test of time.)

    Of course, disk engineering and tape engineering are solving different problems. Tape heads always touch the actual tape while the disk heads do not touch the platter unless bumped. Tape also has more real estate than disk, but tape needs a *lot* more error correction because cartridges are expected to last decades and still have data easily retrievable from them.

  • I'm convinced. (Score:5, Interesting)

    by m.dillon (147925) on Tuesday October 21, 2008 @09:26PM (#25462921) Homepage

    I have to say, the ZFS folks have convinced me. There are simply too many places where bit rot can creep in these days even when the drive itself is perfect. The fact that the drive is not perfect just puts a big exclamation point on the issue. Add other problems into the fray, such as phantom writes (which have also been demonstrated to occur), and it gets very scary very quickly.

    I don't agree with ZFS's race-to-root block updating scheme for filesystem integrity but I do agree with the necessity of not completely trusting the block storage subsystem and of building checks into the filesystem data structures themselves.

    Even more specifically, if one is managing very large amounts of data one needs a way to validate that the filesystem contains what it is supposed to contain. It simply isn't possible to do that with storage-system logic. The filesystem itself must contain sufficient information to make validation possible. The filesystem itself must contain CRCs and hierarchical validation mechanisms to have a proper end-to-end check. I plan on making some adjustments to HAMMER to fix some holes in validation checking that I missed in the first round.

    -Matt

  • by hellwig (1325869) on Tuesday October 21, 2008 @09:51PM (#25463171)
    Yeah, as far as I can tell, the numbers the author used only relate to every 12TB of data read, and have absolutely nothing to do with RAID. Therefore, for every 12TB of data read, there will be a un-recoverable error. That means 50% of al 6TB RAID rebuilds fail. 25% of all 3TB RAID rebuilds, etc... At these rates, RAID was never a viable option.

    I don't know how much data is transferred over the internet every second, but I have to imagine this results in hundreds of thousands of files lost every day (due to URE). In fact, I conjecture that the rate of files being lost is outpacing the rate of files being created, soon we will have a total information blackout due to more people reading data then creating data.

    That, or the author's numbers are bullshit and he's misinterpreting the results.
  • by myz24 (256948) on Tuesday October 21, 2008 @09:57PM (#25463225) Journal

    While I generally agree, I have burned CD-R, CD-RW and DVD+/-R that are all older than 3 years. I haven't had one fail completely just yet. I've come across a couple here or there that have issues reading some parts, but not a complete failure right on day 1,096 as so many people like to claim. One thing that helps is to actually burn at a lower speed.

  • by myz24 (256948) on Tuesday October 21, 2008 @10:17PM (#25463401) Journal

    I don't mean to come off like another one of those "Mac people" but I don't agree that RAID + internet backup is the solution for home users. I think RAID + a realistic backup program is the solution for home users. Time Machine, despite its flamboyancy, marketing friendly name really is a slick way to do backup.

    I'm an all out IT guy, love Linux, can tolerate Windows but Time Machine is by far the best backup solution I have used at home yet. My backup sets are typically 30-40MB from hour to hour if I'm using the computer. Uploading that much data every hour would be a pain.

    The reason I like Time Machine is that it is automatic, provides a level of versioning and allows multiple methods for restoring data. I can do a full bare metal restore, install then restore or just take the drive to another Mac or Linux machine and copy off the files I want, from whatever point in time available.

  • by boner (27505) on Tuesday October 21, 2008 @10:32PM (#25463511)

    SSDs should not be considered a viable option for long term storage just yet. Keep in mind that Flash cells are memory arrays and as such are susceptible to ionizing radiation that can and will flip bits. Store a Flash drive long enough and there will be bit errors beyond the capacity of the on-board CRC/ECC to correct.

    If you insist on using SSDs at least use them with ZFS.

  • by cbreaker (561297) on Tuesday October 21, 2008 @11:04PM (#25463829) Journal

    Well, I did mention FreeNAS so that lends itself to the possibility that I *probably* know what OpenFiler is.

    SATA disks actually aren't fine for a lot of applications. Any SINGLE app, I'll bite. But for most VMware installations where you have over 10 virtual machines (that are actually USED in production) you SATA disks might not cut it. Or they might be fine. It really depends.

    It's not about disk transfer speed, it's about IOPS. The 10 or 15K SAS/FC disks will get your data faster. And that's what it's all about. Nearly all normal infrastructure-type servers (File servers, e-mail, normal-use databases, etc) require a lot of IOPS but don't really care about throughput. It takes basically the same amount of time to fetch 4k as it does to fetch 1MB.

    I'd love to be able to offer an OpenFiler solution to our customers, and I'm pushing for it for some of out smaller clients that want to go virtual, but it's not an easy sell. For home, it's great. For a one-off project or for a non-critical backup system, sure. Production? I trust it, but I live in the real world where our customers don't.

  • The Black Swan (Score:5, Interesting)

    by jschmerge (228731) on Tuesday October 21, 2008 @11:29PM (#25464053)

    A Black Swan is an event that is highly improbably, but statistically probable.

    Yes, it is possible for a drive in a RAID 5 array to become absolutely inoperable, and for one of the other drives to have a read failure at the same time. This is highly unlikely though, and is not the Black Swan. The math use to calculate the likelihood of these two events occurring at the same time is faulty. The MTBF metric for hard drives is measured in 'soft failures'; this is very different from a 'hard failure'.

    The difference between the two types of failures is that a soft failure, while a serious error, is something that the controlling operating system can work around if it detects it. It is extremely unlikely that a hard drive will exhibit a hard failure without having several soft failures first. It is even more unlikely that two drives in the same array will exhibit a hard failure within the length of time it takes to rebuild the array. In my experience, it is more likely that the software controlling the array will run into a bug rebuilding the array. I've seen this with several consumer-grade RAID controllers.

    The true Black Swan is when a disk in the array catches fire, or does something equally as destructive to the entire array.

    To echo other people's points, RAID increases availability, but only an off-site backup solves the data retention problem.

  • by Firehed (942385) on Tuesday October 21, 2008 @11:32PM (#25464069) Homepage

    A very quick check puts an LTO4 tape drive at an entry point of $3700, plus media and actually interfacing it with a system. Most people (companies) with a budget that allow for that kind of hardware not only have such a system in place, but have someone on staff who knows how to avoid the problems that RAID5 can/will bring down the road. And that's fine for businesses. However, RAID5 is reasonably cost-effective for home users as well (at least until offsite via Amazon S3 and the like becomes practical, which is entirely dependent on how fast internet connection uplink speeds are), and much more likely to be employed by someone who isn't aware of these kinds of risks.

    So, as someone who is clearly pretty well-versed in backup-related tech, do you have any ideas that would work for a home user who doesn't live on a yacht?

  • by jaxtherat (1165473) on Tuesday October 21, 2008 @11:48PM (#25464193) Homepage

    Judging by the budget you quoted, it's a combination of all of the above: you are a crappy sysadmin for a crappy company with limited growth potential.

    Sigh. *ignores flamebait*

    Anyway, here's the actual reality of the situation:

    I'm a not brilliant (but certainly not crappy either) sysad who is working for a company that has rapidly expanded to the point where they need a full time sysad, and then felt the kaboom of the subprime mortgage debacle, since they consult to the property market. Hence why my original upgrade budget got shrunk big time.

    The company BOTH cares about their data AND can't afford a proper backup system.

  • by Slashdot Parent (995749) on Tuesday October 21, 2008 @11:51PM (#25464219)

    No method is foolproof, especially when it's bound to be boring as hell, and you've got an inevitable human factor. You get lazy moving the tapes offsite, you put off fixing a dead drive because there are 4 others, you wipe your main partition upgrading your distro and forget that your CRON rsync script uses the handy --delete flag, and BOOM wipes out your backup.

    Jesus Christ, you must be one unlucky soul. Do you live your entire life in a worst-case scenario?

    The system that I use for data storage is as follows:

    1. 2TB NAS that uses a scrubbed (if you don't know what that means, look it up) Linux Software RAID
    2. Anything important goes into a directory hierarchy that is backed up automatically via rsnapshot (in other words, one botched snapshot isn't going to leave me up a creek without a backup.
    3. Each week, my rsnapshot directory is automatically encrypted (and thus compressed) with gpg and uploaded to Amazon S3. My rsnapshot directory currently occupies about 3GB of space after gpg's automatic compression.
    4. The 5th oldest backup in S3 is automatically deleted.
    5. When I think of it, I burn my rsnapshot directory to DVD and my wife takes it into her office and leaves it there.

    This system may not be foolproof (what is?), but it is pretty frickin' safe, and costs me roughly $3 or $4 per month. Not too shabby for what I would consider to be a fairly robust backup system for a home user.

    I suppose the biggest challenge is deciding what goes into rsnapshot. If my RAID array suffered a massive failure, I would definitely lose data. But this is mostly video content, and really, if I lose my mythtv shows, it is not exactly as catastrophic as if I lost, say, my quickbooks data.

    There are a lot of things that keep me awake at night, but loss of important data is not one of them.

  • by nvatvani (989200) on Wednesday October 22, 2008 @05:47AM (#25465817)

    Firstly, the core determinants of HDD failures are:

    • Number of writes per second
    • Number of reads per second
    • Revolutions per minute
    • Environmental conditions, i.e. - temperature, humidity, etc...

    The studies by CMU and Google are not broken down at the application level, i.e. - what purpose were the HDDs serving. For example an HDD serving as an archive will perform differently from an HDD doing constant defragmentation, for the sake of example, or other read/write intensive functions as compared to archiving.

    Such a mashing is therefore "unfair". But ok, lets take the numbers produced by CMU and Google. Their rates of failure does seem to threaten RAID 5's (and other RAIDs) reliability with increasing disk sizes. This issue is immediately resolved by the RAID controller - but yes it means an extra performance penalty for the RAID implementation.

    As such, RAID 5 will not die. Its the RAID controllers that need to be more intelligent, at the expense of performance.

  • by theaveng (1243528) on Wednesday October 22, 2008 @06:44AM (#25466049)

    >>>As the RAID controller is busily reading through those 6 disks to reconstruct the data from the failed drive, it is almost certain it will see an [unrecoverable read error].
    >>>

    This is a load of crap. The computer wouldn't just give up. It would make a second attempt to read that bit, and do so successfully. One bad read does not necessarily mean that spot on the disc is permanently damaged.

    Furthermore even if that bit is lost, it depends-upon what kind of data was damaged. If it's an MP3 or MPEG or JPEG, one lost bit is not going to visible to the viewer. The human ear and eye are not sensitive enough to detect that small an error, especially with lossy-compressed sounds and images. ----- If it's a word doc, then you might get the word "progrbm" instead of program. The document is still usable even with that mis-spelling. I would hope RAID controllers are intelligent enough to not throw away 99.999999999999% of the data and declare it "unrecoverable" just because of one lost bit.

  • by Isao (153092) on Wednesday October 22, 2008 @08:57AM (#25466893)
    Good first thought, but the idea that keeps hanging in the periphery of the discussion above is that if you consolidate massive storage into a single LUN like that, it takes too long to back it up. The controllers simply can't move the data off fast enough. This is why in production systems you never see RAID LUNs maxed out. (Another reason is to distribute your transactions across multiple I/O channels.)

    EMC and its smaller rivals make a fortune on clever array technology that allows you to perform "snap clones" of LUNs that can be later backed off to off line storage at a lower rate. As long as it can be done before the next "snap" window, you're OK. Otherwise, reduce the LUN size and stand up more robots.

  • by techess (1322623) on Wednesday October 22, 2008 @09:04AM (#25466981)

    I always love it when Fed-Ex destroys something and then tries to hide it. One day I walked past the shipping office and I smelled the very strong odor of hydraulic oil coming from the room. I take a look inside since we shouldn't be receiving anything that has hydraulic oil in it. I found a bunch of boxes with the local Detroit Airport logo all over them and sealed with DET labeled tape. The cardboard was completely soaked through with the oil.

    I carefully opened one of the boxes and found it contained servers! It appears that the original boxes got in some sort of accident at the airport and were completely soaked. At the airport Fed-Ex or the baggage handlers did us a "favor" and re-boxed everything. The servers were so coated (and filled) that even the new boxes were completely soaked through and the bottoms of the boxes were starting to pull apart. The Fe-Ex guy (so we wouldn't refuse them) dropped them off at lunch and then got some random person in the hall to sign off on it.

    We had to pay for new servers to be built ASAP and shipped overnight (UPS this time) at huge cost for us. Since someone had signed off on the package we then had a very long fight to get Fed-Ex to pay for the equipment they destroyed. We never got the extra cost for the overnight shipping and the rush build reimbursed.

  • by sjames (1099) on Wednesday October 22, 2008 @12:19PM (#25469953) Homepage

    A remarkable number of RAID units throw a tantrum and refuse to even keep trying at the first sign of real trouble. That's why I prefer to use the Linux soft RAID over various hardware RAIDs. At least the layout is well documented so I have a chance of putting most of it back together later.

If A = B and B = C, then A = C, except where void or prohibited by law. -- Roy Santoro

Working...