Forgot your password?
typodupeerror
Data Storage Upgrades

Why RAID 5 Stops Working In 2009 803

Posted by kdawson
from the back-'em-up-rawhide dept.
Lally Singh recommends a ZDNet piece predicting the imminent demise of RAID 5, noting that increasing storage and non-decreasing probability of disk failure will collide in a year or so. This reader adds, "Apparently, RAID 6 isn't far behind. I'll keep the ZFS plug short. Go ZFS. There, that was it." "Disk drive capacities double every 18-24 months. We have 1 TB drives now, and in 2009 we'll have 2 TB drives. With a 7-drive RAID 5 disk failure, you'll have 6 remaining 2 TB drives. As the RAID controller is busily reading through those 6 disks to reconstruct the data from the failed drive, it is almost certain it will see an [unrecoverable read error]. So the read fails ... The message 'we can't read this RAID volume' travels up the chain of command until an error message is presented on the screen. 12 TB of your carefully protected — you thought! — data is gone. Oh, you didn't back it up to tape? Bummer!"
This discussion has been archived. No new comments can be posted.

Why RAID 5 Stops Working In 2009

Comments Filter:
  • by Whiney Mac Fanboy (963289) * <whineymacfanboy@gmail.com> on Tuesday October 21, 2008 @07:03PM (#25461453) Homepage Journal

    12 TB of your carefully protected â" you thought! â" data is gone. Oh, you didn't back it up to tape? Bummer!

    If it wasn't backed up to an offsite location, then it wasn't carefully protected.

    • Re: (Score:3, Interesting)

      by rhathar (1247530)
      "Safe" production data should be in a SAN environment anyways. RAID 5 on top of RAID 10 with nightly replays/screenshots and multi-tiered read/writes over an array of disks.
      • Don't panic! (Score:4, Insightful)

        by Joce640k (829181) on Tuesday October 21, 2008 @08:59PM (#25462647) Homepage

        RAID 5 will still be orders of magnitude more reliable than just having a single disk.

        • Re:Don't panic! (Score:5, Insightful)

          by Anonymous Coward on Tuesday October 21, 2008 @09:47PM (#25463119)

          No, it won't. That's the point of this not-news article. It's getting to the point where (due to the size of the disks) a rebuild takes longer than the statistically "safe" window between individual disk failures. Two disks kick it in the same timeframe (the chance of which increases as you add disks) and you're screwed.

          A poorly designed multi-disk storage system can easily be worse than a single disk.

          • Re:Don't panic! (Score:5, Insightful)

            by bstone (145356) * on Tuesday October 21, 2008 @11:35PM (#25464087)

            Using the same failure rate figures as the article, you WILL get an unrecoverable read error each and every time you back up your 12 TB of data. You will be able to recover from the single block failure because of the RAID 5 setup.

            With that kind of error rate, drive manufacturers will be forced to design to higher standards, they won't be able to sell drives that fail at that rate.

          • Re:Don't panic! (Score:5, Insightful)

            by Eivind (15695) <eivindorama@gmail.com> on Wednesday October 22, 2008 @02:05AM (#25464929) Homepage

            Yes. It's amazing that the article presents the basic point so horribly poorly. The problem is not the capacity of the disks.

            The problem is that the capacity has been growing faster than the transfer-bandwith. Thus it takes a longer and longer time to read (or write) a complete disk. This gives a larger window for double-failure.

            Simple as that.

            • Re:Don't panic! (Score:4, Informative)

              by drsmithy (35869) <drsmithy.gmail@com> on Wednesday October 22, 2008 @02:36AM (#25465059)

              The problem is that the capacity has been growing faster than the transfer-bandwith. Thus it takes a longer and longer time to read (or write) a complete disk. This gives a larger window for double-failure.

              No, the point is that (statistically) you can't actually read all of the data without having another read error (statistically speaking).

              Whether you read it all at 100MB/sec or 10MB/sec (ie: how long it takes) is irrelevant (within reason). The problem is that published URE rates are such that you "will" have at least one during the rebuild (because of the amount of data).

              The solution, as outlined by a few other posters, are more intelligent RAID5 implementations that don't take an entire disk offline just because of a single sector read error (some already act like this, most don't).

        • Re:Don't panic! (Score:5, Informative)

          by nine-times (778537) <nine.times@gmail.com> on Tuesday October 21, 2008 @11:04PM (#25463823) Homepage

          How reliable RAID5 is depends, because actually the more disks you have, the greater the likelihood that one of them will fail in any set period of time. So obviously if you have a RAID 0 of lots of disks, then there is a much better chance that the RAID will fail than that any particular disk will fail.

          So the purpose of RAID5 is not so much to make it orders of magnitude more reliable than just having a single disk, but rather to mitigate the increased risk that would come from having a RAID0. So you'd have to calculate, for the number of disks and the failure rate of any particular drive, what are the chances of having 2 drives fail at the same time (given a certain response rate to drive failure). If you have enough drives and a slow enough response to disk failures, it's at least theoretically possible (I haven't done the math) that a single drive is safer.

      • by Gr8Apes (679165) on Tuesday October 21, 2008 @10:27PM (#25463465)

        "Safe" production data ...with nightly replays/screenshots ...

        Exactly. You make backups, no matter what. Anyone that relies on RAID for backups will get what they deserve, sooner than later.

        RAID and SANs are for uptime (reliability) and/or performance. SANs with snapshots and RAID with backups are for data recovery.

        • by jaxtherat (1165473) on Tuesday October 21, 2008 @10:41PM (#25463595) Homepage

          I love how you use the language "get what they deserve".

          What about my situation, where I have to store ~ 1TB of unique data per office in 3 offices that are roughly 1000 km apart and I have to keep everything backed up with a budget of less than ~AU$ 4000 IN TOTAL?

          I have to run a 4 x 1TB RAID arrays on the file servers and use rsync to synchronise all the data between the offices nightly "effectively" doing offsites, and have a 3 TB linux NAS (also using RAID 5) for incrementals at the main site.

          That is all I can afford, and I feel that I'm doing my best for my employer given my budget and still maintaining my professional integrity as a sysad.

          Why do I "get what they deserve" when I can't afford the necessary LTO4 drives, servers and tapes (I worked it out I'd need ~ AU$ 30,000) to do it any other way?

          • by totally bogus dude (1040246) on Tuesday October 21, 2008 @11:37PM (#25464117)

            If you're replicating data between all three offices (and a fourth backup system?) then you are making backups. The vitriol is aimed at people who set up a RAID-5 array and then say "hooray my data is protected forevermore!".

            Tape systems, especially high capacity tapes, are very expensive, and even those are prone to failures. Online backups to other hard drives are the only affordable means of backing up today's high capacity, low cost hard drives. To do it properly though, you need to make sure you do have separate physical locations for protection from natural disasters, fires, etc. Which you have.

            The only concern your system may have is: how do you handle corrupted data, or user error? If you've got a TB of data at each site it's unlikely that mistakes will be noticed quickly, so after the nightly synchronisation all your backups will now have the corrupt data and when someone realises in a month's time that someone deleted a file they shouldn't have or saved crap data over a file, how do you restore it? Hopefully your incremental backups can be used to recover the most recent good copy of the data, but how long do you keep those for?

          • by Gr8Apes (679165) on Wednesday October 22, 2008 @01:05AM (#25464631)

            External TB drives are around $150 bucks. Buy several. Make rotating copies. It's doable on your budget. (We're in the same boat, btw, and that was our solution for the dev machines)

            However, the real issue is your employer has decided on the budget, and what you do with it is how well you're protected. Sometimes we don't get a Fibre NAS with remote backup, no matter how much we want it. Sometimes we have to get by with the old rsync, dd, or pure copy or even tar/zip with rotating media. (Anything less is suicide)

        • by ajkst1 (630286) on Tuesday October 21, 2008 @11:03PM (#25463817)
          I have to echo this comment. RAID is not a backup. It is a form of redundancy. Nothing is stopping that system from losing two drives and completely losing your data. RAID simply allows you to keep working after a SINGLE disk failure. If you're not making backups of your critical data and relying on RAID to save your behind, you're insane.
    • by SatanicPuppy (611928) * <Satanicpuppy@ g m a i l . c om> on Tuesday October 21, 2008 @07:16PM (#25461583) Journal

      Yea, because we all backup 12TB of home data to an offsite location. Mine is my private evil island, and I've bioengineered flying death monkeys to carry the tapes for me. They make 11 trips a day. I'm hoping for 12 trips with the next generation of monkeys, but they're starting to want coffee breaks.

      I'm sorry, but I'm getting seriously tired of people looking down from the pedestal of how it "ought" to be done, how you do it at work, how you would do it if you had 20k to blow on a backup solution, and trying to apply that to the home user. Even the tape comment in the summary is horseshit, because even exceptionally savvy home users are not going to pay for a tape drive and enough tapes to archive serious data, more less handle shipping the backups offsite professionally.

      This is serious news. As it stands, the home user that actually sets up a RAID 5 raid is in the top percentile for actually giving a crap about home data. Once that becomes a non-issue, then the point has come when a reasonable backup is out of reach of 99% of private individuals. This, at the same time as more and more people are actually needing a decent solution.

      • by networkBoy (774728) on Tuesday October 21, 2008 @07:24PM (#25461679) Homepage Journal

        you know the other solution is to not use RAID5 with these big drives, or to go to RAID1, or to actually back up the data you want to save to DVD and accept a disk failure will cost you the rest.

        Now, while 1TB onto DVDs seems like quite a chore (and I'll admit it's not trivial), some level of data staging can help out immensely, as well as incrementally backing up files, not trying to actually get a full drive snapshot.

        Say you backup like this:
        my pictures as of 21oct2008
        my documents (except pictures and videos) as of 22 oct2008
        etc.
        while you will still lose data in a disk failure, your loss can be mitigated, especially if you only try to backup what is important. With digital cameras I would argue that home movies and pictures are the two biggest data consumers that people couldn't backup to a single dvd and that they would be genuinely distressed to lose.
        -nB

        • by SatanicPuppy (611928) * <Satanicpuppy@ g m a i l . c om> on Tuesday October 21, 2008 @07:36PM (#25461817) Journal

          Yea, but DVD is transient crap. How long will those last? A few years? You cannot rely on home-burned optical media for long term storage, and while burning 12 terabytes of information on to one set of 1446 dvds (double layer) may not seem like a big deal, having to do it every three years for the rest of your life is bound to get old.

          For any serious storage you need magnetic media, and though we all hate tape, 5 year old tape is about a million times more reliable than a hard drive that hasn't been plugged in in 5 years.

          So either you need tape in the sort of quantity that the private user cannot justify, or you're going to have to spring for a hefty RAID and arrange for another one like it as a backup. Offsite if you're lucky, but it's probably just going to be out in your garage/basement/tool shed.

          Now, what do you do if you can't rely on RAID? No other storage is as reliable and cheap as the hard drive. ZFS and RAID-Z may solve the problem, but they may not...You can still have failures, and as hard disk sizes increase, the amount of data jeopardized by a single failure increases as well.

          • Re: (Score:3, Insightful)

            by MBCook (132727)
            Good points. While magnetic media is problematic, SSDs are going to become a very viable option for the home backup (compared to stacks of DVDs or the possible reliability of old magnetic HDs).
          • Re: (Score:3, Insightful)

            by grahamd0 (1129971)

            Yea, but DVD is transient crap. How long will those last?

            But DVD is *cheap* transient crap, and perfectly adequate for home backups.

            I've got something in the area of 200GB of data on the machine which I'm currently using to type this, but very little of that data has any intrinsic or sentimental value to me. Most of it is applications and games that could easily be reinstalled from the original media or re-downloaded. A DVD or two could easily hold all of the data I *need* and even cheap optical media will outlive this machine's usefulness.

          • by Hadlock (143607) on Tuesday October 21, 2008 @08:11PM (#25462171) Homepage Journal

            I can't vouch for DVD-R but I have el-cheapo store brand CD-Rs that I backed up my MP3 collection to 11 years ago and they work just fine. My solution is this:
             
            Back everything up that's not media (mp3/video) every 6 months to CD-R, and once a year, copy all my old data onto a new hard drive that's 20+% larger than the one I bought last year and unplug the old one. I have 11 old hard drives sitting in the closet should I ever need that data, and the likelihood of a hard drive failing in the first year (after the first 30 days) is phenomenally low. Any document that I CAN'T lose between now and the next CD-R backup goes on a thumb drive or it's own CD-R and/or email it to myself.

          • by mlts (1038732) * on Tuesday October 21, 2008 @08:44PM (#25462497)

            I just wish all the density improvements that hard disks get would propagate to tape. Tape used to be a decent backup mechanism, matching hard disk capacities, but in recent time, tape drives that have the ability to back up a modern hard disk are priced well out of reach for most home users. Pretty much, you are looking at several thousand as your ticket of entry for the mechanism, not to mention the card and a dedicated computer because tape drives have to run at full speed, or they get "shoe-shining" errors, similar to buffer underruns in a CD burn, where the drive has to stop, back up, write the data again and continue on, shortening tape life.

            I'd like to see some media company make a tape drive that has a decently sized RAM buffer (1-2GB), USB 2, USB 3, or perhaps eSATA for an interface port, and bundled with some decent backup software that offers AES encryption (Backup Exec, BRU, or Retrospect are good utilities that all have stood the test of time.)

            Of course, disk engineering and tape engineering are solving different problems. Tape heads always touch the actual tape while the disk heads do not touch the platter unless bumped. Tape also has more real estate than disk, but tape needs a *lot* more error correction because cartridges are expected to last decades and still have data easily retrievable from them.

      • by Whiney Mac Fanboy (963289) * <whineymacfanboy@gmail.com> on Tuesday October 21, 2008 @07:33PM (#25461773) Homepage Journal

        Oh come on. Do you have 12TB of home data? Seriously? And if you do, it's not that hard to have another another 12TB of external USB drives at some relatives place.

        I've got about 500GB of data that I care about at home & the whole lot's backed up onto a terrabyte external HDD at my Dad's. It's not that hard.

        If you think raid is protecting your data, you're crazy.

        • by DrVxD (184537) on Tuesday October 21, 2008 @07:51PM (#25461967) Homepage Journal

          Oh come on. Do you have 12TB of home data? Seriously? And if you do, it's not that hard to have another another 12TB of external USB drives at some relatives place.

          Not all of us have relatives, you insensitive...[URE]

      • by sholsinger (1131365) <sholsinger@gmail.com> on Tuesday October 21, 2008 @07:49PM (#25461953) Homepage

        Next they'll want to unionize. At that point you've lost everything.

      • by John Hasler (414242) on Tuesday October 21, 2008 @08:28PM (#25462365) Homepage

        Prioritize your data. I cannot believe that a home user has 12TB of important stuff. Back up your critical records both on site and off [1]. Back up the important stuff on site with whatever is convenient. Let the rest go hang.

        [1] Use DVDs in the unlikely event you have that much critical data. Few home users will have a critical need for that stuff beyond the life of the media. Any that do can copy it over every five years, and take the opportunity to delete the obsolete stuff.

    • Re: (Score:3, Interesting)

      by networkBoy (774728)

      True.
      Also FWIW I only run RAID 1 and JBOD.
      For things that must be on-line, or are destined for JBOD but not yet archived to backup media, they are located on one of the RAID volumes. For everything else it's off to JBOD, where things are better than RAID5

      Why?

      I have 6 TB of JBOD storage and 600(2x300 volumes) GB of RAID 1. If I striped the JBOD into 6TB (7 drives) and one drive failed all the near-line data would be virtually off-line (and certainly read-only) while the array re-built. With JBOD, should a

    • by rs79 (71822) <hostmaster@open-rsc.org> on Tuesday October 21, 2008 @08:08PM (#25462153) Homepage

      You get your first RAID controller from a trusted friend. "Here" he says "try this" and hands you a Mylex board. It has a 64 bit bus and 3 SCSI LVD connectors. Oooh. That looks fast. So you start ebaying drives, cables, adapters, more controllers, the inevitable megawatt power supply and you mess around with raid 1, raid 0 raid 1+0 and raid 5. Suddenly every system falls prey to RAIDMANIA; eventually for yourself you build a system with 3 controllers, with 3 busses each and a drive on each one of 9 busses. With a controller for swap, one for data and one for the system will Windows now be fast? Yeah, sorta. Those drives sure are quiet - from a click-click busy noise perspective, NOT from a "sounds liks a jet airplane when running" perspective. Heat is an issue, too.

      http://rs79.vrx.net/works/photoblog/2005/Sep/15/DSCF0007s.jpg [vrx.net]

      But oh my are the failure modes spectacular.

      I just use a laptop now and make several sets of backup DVDs or just copy to spare drives. I love RAID to death. But it's really only marginally worth the effort in the real world. But if you need fast, OMG.

  • RAID != Backup (Score:4, Insightful)

    by vlad_petric (94134) on Tuesday October 21, 2008 @07:09PM (#25461521) Homepage

    I mean, WTF? Many people regard RAID as something magical that will keep their data no matter what happens. Well ... it's not.

    Furthermore, for many enterprise applications disk size is not the main concern, but rather I/O throughput and reliability. Few need 7 disks of 2 TB in RAID5.

    • Re:RAID != Backup (Score:4, Insightful)

      by Anonymous Coward on Tuesday October 21, 2008 @07:20PM (#25461643)

      Furthermore, for many enterprise applications disk size is not the main concern, but rather I/O throughput and reliability. Few need 7 disks of 2 TB in RAID5.

      Some of us do need a large amount of reasonably priced storage with fast read speed & slower write speed. This pattern of data access is extremely common for all sorts of applications.

      And this raid 5 "problem" is simply the fact that modern sata disks have a certain error rate. But as the amount of data becomes huge, it becomes very likely that errors will occur when rebuilding a failed disk. But errors can also occur during normal operation!

      The problem is that sata disks have gotten a lot bigger without the error rate dropping.

      So you have a few choices:

      - use more reliable disks (like scsi/sas) which reduce the error rate even further
      - use a raid geometry that is more tolerant of errors (like raid 6)
      - use a file system that is more tolerant of errors
      - replicate & backup your data

    • Re:RAID != Backup (Score:4, Insightful)

      by MBCook (132727) <foobarsoft@foobarsoft.com> on Tuesday October 21, 2008 @07:30PM (#25461739) Homepage

      I've always understood it as RAID exists to keep you running either during the 'outage' (i.e. until a new disk is built) or at least long enough to shut things down safely and coherently (as opposed to computer just locking up or some such).

      It's designed to give you redundancy until you fix the problem. It's designed to let you limp along. It's not designed to be a backup solution.

      As others have mentioned: if you want a backup set of hard drives, you run RAID 10 or 15 or something where you have two(+) full copies of your data. And even that won't work in many situations (i.e. computer suddenly finds it's self in a flood).

      All that said, the guy has a possible point. How long would it take to build a new 1TB drive into an array? That could be problematic.

      There is a reason SANs and other such things have 2+ hot spares in them.

      • Re:RAID != Backup (Score:5, Informative)

        by Walpurgiss (723989) on Tuesday October 21, 2008 @08:03PM (#25462095)
        I run a raid5 with 1TB disks. Growing the array from 3 to 4 took around 4 hours, 4 to 5 took maybe 8 or 10, 5 to 7 took something like 30 hours I guess.

        But that's growing from a previous capacity to a larger capacity.
        Using mdadm to fake a failure by removing and adding a single drive, the recover time generally was 4-5 hours.
  • What. (Score:4, Insightful)

    by DanWS6 (1248650) on Tuesday October 21, 2008 @07:13PM (#25461545)
    The problem with Raid 5 is that the more drives you have the higher probability you have that more than one drive dies. That's why you have multiple raid 5 arrays of 4 disks maximum instead of one array of 7 disks.
  • by realmolo (574068) on Tuesday October 21, 2008 @07:14PM (#25461573)

    If you have one RAID5 box, just build another one that replicates it. Use that for your "hot backup". Then back that up to tape, if you must.

    Storage is so cheap these days (especially if you don't need super-fast speeds and can use regular SATA drives), that you might as well just go crazy with mirroring/replicating all your drives all over the place for fault-tolerance and disaster-recovery.

  • Testable assertion (Score:4, Interesting)

    by merreborn (853723) on Tuesday October 21, 2008 @07:18PM (#25461613) Journal

    But even today a 7 drive RAID 5 with 1 TB disks has a 50% chance of a rebuild failure. RAID 5 is reaching the end of its useful life.

    This is trivially testable. Any slashdotters have experience rebuilding 7TB RAID 5 arrays?

    You'd think, if this were really an issue, we'd be hearing stories from the front lines of this happening with increasing frequency. Instead we have a blog post based entirely on theory, without a single real-world example for corroboration.

    What's more, who even uses RAID 5 anymore? I thought it was all RAID 10 and whatnot these days.

    • by theendlessnow (516149) * on Tuesday October 21, 2008 @07:39PM (#25461837)
      I have large RAID 5's and RAID 6's... I generally don't have any RAID columns over 8TB. I HAVE had drive failures. Yes... I'm talking cheapo SATA drives. No... I have not see the problem this article presents. Do I backup critical data? Yes. The only time I lost a column was due to a firmware bug which caused a rebuild to fail. Took awhile to restore from backup, but that was about the extent of the damage. I would call this article FUD... deceptive FUD, but very much FUD.
  • by EdIII (1114411) * on Tuesday October 21, 2008 @07:22PM (#25461661)

    I can see a lot of people getting into a tizzy over this. The RAID 5 this guy is talking about is controlled by one STUPID controller.

    There are a lot of methods, and patented technology that prevent just the situation he is talking about. Here is just one example:

    PerfectRAID(TM) is Promise's patented RAID data protection technology; a suite of data protection and redundancy features built into every Promise RAID product.

            *
                Predictive Data Migration (PDM): Replace un-healthy disk member in array and keep array on normal status during the data transition between healthy HD and replaced HD.
            *
                Bad Sector Mapping and Media Patrol: These features scan the system's drive media to ensure that even bad physical drives do not impact data availability
            *
                Array Error Recovery: Data recovery from bad sector or failed HD for redundant RAID
            *
                RAID 5/6 inconsistent data Prevent (Write Hole Table)
            *
                Data content Error Prevent (Read/Write Check Table)
            *
                Physical Drive Error Recovery
            *
                SMART support
            *
                Hard/Soft Reset to recover HD from bad status.
            *
                HD Powercontrol to recover HD from hung status.
            * NVRAM event logging

    RAID is not perfect, not by any stretch, but if you use it properly it will serve it's purpose quite nicely. If your data is that critical, having it on a single raid is ill advised anyways. If you are talking about databases, then RAID 10 is more preferable and replicating the databases across multiple sites, even more so.

  • Smells Like FUD. (Score:5, Insightful)

    by sexconker (1179573) on Tuesday October 21, 2008 @07:25PM (#25461697)

    What is this article about?

    They say that since there is more data, you're more likely to encounter problems during a rebuild.

    The issue isn't with RAID, it's with the file system. Use larger blocks/sectors.

    Losing all of your data requires you to have a shitty RAID controller. A decent one will reconstruct what it can.

    The odds of you encountering a physical issue increases as capacity increases, and decreases as reliability increases. In theory, the 1 TB and up drives are pretty reliable. Anything worth protecting should be on server-grade hard drives anyway.

    The likelihood of a physical problem popping up during your rebuild is no higher with new drives than it was with old drives. I haven't noticed my larger drives failing at higher rates than my older, smaller drives. I haven't heard of them failing at higher rates.

    Remember, folks, RAID is a redundant array of inexpensive disks. The purpose of RAID is to be fault-tolerant, in the sense that a few failures don't put you out of production. You also get the nice bonus of being able to lump a bunch of drives together to get a larger total capacity.

    RAID is not a backup solution.

    RAID 5 and RAID 6, specifically, are still viable solutions for most setups. If you want more reliability, go with RAID 1+0, RAID 5+0, whatever.

    Choosing the right RAID level has always depended on your needs, setup, budget, and priorities.

    Smells like FUD.

  • by Vellmont (569020) on Tuesday October 21, 2008 @07:25PM (#25461705)

    The whole argument boils down the published URE rate being both accurate, and a foregone conclusion. Will disk makers _really_ make drives that have a sector failure for every 2 terabytes, or will they improve whatever technology is causing these URE's to be much more rare? (if the rate was real in the first place).

  • by mbone (558574) on Tuesday October 21, 2008 @07:25PM (#25461707)

    How many times does this have to be said.

    RAID is not a backup. RAID is designed to protect against hardware failures. It can also increase your I/O speed, which is more important in some cases. Backups are different.

    Depending on what you are doing, you may or may need a RAID, but you definitely need backups.

  • by petes_PoV (912422) on Tuesday October 21, 2008 @07:34PM (#25461777)
    The larger the drives, the longer it takes to resilver (rebuild the RAID) the array. During this time performance takes a real hit - no matter what the vendors tell you, it's unavoidable: you simply must copy all that data.

    In practice, this means that while your array is rebuilding, your performance SLAs go out of the window. If this is for an interactive server, such as a TP database or web service you end up with lots of complaints and a large backlog of work.

    The result is that as disks get bigger, the recovery takes longer. This is what make RAID less desirable, not the possibility of a subsequent failure - that can always be worked around.

  • RAID6 = Win (Score:3, Insightful)

    by MukiMuki (692124) on Tuesday October 21, 2008 @07:39PM (#25461851)

    Scrub once a week, or once every two weeks.

    RAID6 isn't about losing any two disks, it's about having two parity stripes. It's about being able to survive sector errors without any worry.

    It's about losing ONE drive and still have enough parity to replace it without any errors.

    RAID6 on 5 drives is retarded, tho, because it leaves you absurdly close to RAID1 in kept space. RAID6 is for when you have 8-10 drives. At that point you barely notice the (N - 2) effect and you have a fast (provided your processor can handle it all) chunk of throughput along with an incredibly reliable system. Well, N-3 with a hotswap.

    Personally, I think I'd go RAID-Z2 via ZFS if only because it's a little bit sturdier a filesystem to begin with.

  • by gweihir (88907) on Tuesday October 21, 2008 @07:58PM (#25462041)

    My observed error rate with about 4TB of storage is much, much lower. I did run a full surface scan every 15 days for two years and did not have a single read error in about two years. (The hardware has since been decomissioned and replace dby 5 RAID6 Arrays with 4TB each.)

    So, I did read roughly 100 times 4TB. That is 400TB = 3.2 * 10^15 bits with 0 errors. That does not take into account normal read from the disks, which should be substantially more.

  • My solution (Score:3, Insightful)

    by SuperQ (431) * on Tuesday October 21, 2008 @07:59PM (#25462053) Homepage

    I'm in the process of building a new 8x 1T array. I'm not using any fancy raid card. Just a LSI 1068E chipset with a SAS expander to handle LOTS (16 slots in the case, using 8 right now).

    I'm not putting the entire thing into one big array. I'm breaking up each 1T drive into ~100GB slices that I will use to build several overlapping arrays. Each MD device will be no more than 4-5 slices. This way if an error occurs on one disk in one part of a disk I will have a higher probability of recovery.

    I may also use RAID 6 to give me more chance of rebuilding.

    Disk errors tend to not be whole disk errors, just small broken physical parts of a single disk.

    SMART will give me more chance to detect and replace dying drives.

  • by backtick (2376) on Tuesday October 21, 2008 @08:24PM (#25462319) Homepage Journal

    First off, Isn't this story a year+ old? Sheesh.

    Second off, if you're worried about URE on X number of disks, what about a single capacitor cooking off on the raid controller? No serious data is stored on a single raid controller system, without good backups or another raid'd system on completely unique hardware. Yes, if you put a lot of disk on one controller and have a failure you have a higher risk of *another* failure. That's why important data doesn't depend on *only* RAID, and why lots of places use mirroring, replication, data shuttling, etc. This isn't new. Most folks that can't afford to rebuild from backups or from a mirror'd remote device also couldn't have used 12TB for anything *but* bulk offline file storage because it's slower than christmas VS a 'real' storage array. Using it for the uber HD DVR? Great. Oh no, you lose X-files's last episodes. This isn't banking data we're talking here.

  • Scrub your arrays (Score:5, Interesting)

    by macemoneta (154740) on Tuesday October 21, 2008 @08:32PM (#25462407) Homepage

    This is why you scrub your RAID arrays once a week. If you're using software RAID on Linux, for example:

    echo check > /sys/block/md0/md/sync_action

    The above will scrub array md0 and initiate sector reallocation if needed. You do this while you have redundancy so the bad data can be recovered. Over time, weak sectors get reallocated from the spare bands, and when you do have a failure the probability of a secondary failure is very low over the interval needed for drive replacement.

    Most non-crap hardware controllers also provide this function. Read the documentation.

    • Re: (Score:3, Informative)

      by kyubre (1186117)
      I worked at Maxtor up till 2006, and had the privilege of being able to play with several raid controllers, and that coincidently is how I got started with Linux at home (software RAID). At the time, and mind you I only had 160 GB and 250 GB drives to play with, I build a number of raid-5 arrays up to 2 TB. When people think about RAID failure, they generally think about a hardware failure - a sector that can't be read etc. That is only the "obvious" problems. Even under ideal conditions, the 1e15 - 1e1
  • Punch Cards (Score:3, Funny)

    by vldragon (981127) on Tuesday October 21, 2008 @08:46PM (#25462519)
    I used to use the old punch card system to backup my data. Sure it takes a while but it was totally worth it... Until one day while attempted to move the many boxes fully of carefully sorted cards I fell down the steps and the cards went everywhere. I learned from that mistake and started writing all everything down on paper... Lot's o' 1's and 0's, my hand hurt.. A lot. But there was a fire at my off site :( sot I had to resort to the ultimate old school back up. A chisel and a rock... a really really big rock.
  • RAID6 is far better. (Score:3, Informative)

    by DamnStupidElf (649844) <Fingolfin@linuxmail.org> on Tuesday October 21, 2008 @09:01PM (#25462663)

    Not only are there two parity drives, but the operating system can perform automatic scanning of the drives to ensure that all data and parity disks are correct and silently correct any errors that occur on only one disk. It only takes a few days to scan 12 TB, and if this is done often enough the probability of a two failed disks plus a previously undetected unrecoverable error on a third disk is quite a bit lower than the failure rate for RAID5. RAID5 volumes can be automatically scanned, but if corruption is detected there's no way to know which of the disks was actually incorrect, barring an actual message from the hard disk. Silent corruption is a much bigger enemy of RAID5 than RAID6.

    I don't know why the article focuses on RAID5; RAID1 or RAID10 will have exactly the same issues at a slightly lower frequency than RAID5, but more frequently than RAID6.

    Ultimately, the solution is simply more redundancy, or more reliable hardware. RAID with 3 parity disks is not much slower than RAID6, and dedicated hardware or increasing CPU speed will take care of that faster than drive speeds increase.

  • by fortapocalypse (1231686) on Tuesday October 21, 2008 @09:02PM (#25462669)

    RAID???!!! Aaaaaaah! (Drive dies.)

  • I'm convinced. (Score:5, Interesting)

    by m.dillon (147925) on Tuesday October 21, 2008 @09:26PM (#25462921) Homepage

    I have to say, the ZFS folks have convinced me. There are simply too many places where bit rot can creep in these days even when the drive itself is perfect. The fact that the drive is not perfect just puts a big exclamation point on the issue. Add other problems into the fray, such as phantom writes (which have also been demonstrated to occur), and it gets very scary very quickly.

    I don't agree with ZFS's race-to-root block updating scheme for filesystem integrity but I do agree with the necessity of not completely trusting the block storage subsystem and of building checks into the filesystem data structures themselves.

    Even more specifically, if one is managing very large amounts of data one needs a way to validate that the filesystem contains what it is supposed to contain. It simply isn't possible to do that with storage-system logic. The filesystem itself must contain sufficient information to make validation possible. The filesystem itself must contain CRCs and hierarchical validation mechanisms to have a proper end-to-end check. I plan on making some adjustments to HAMMER to fix some holes in validation checking that I missed in the first round.

    -Matt

  • The Black Swan (Score:5, Interesting)

    by jschmerge (228731) on Tuesday October 21, 2008 @11:29PM (#25464053)

    A Black Swan is an event that is highly improbably, but statistically probable.

    Yes, it is possible for a drive in a RAID 5 array to become absolutely inoperable, and for one of the other drives to have a read failure at the same time. This is highly unlikely though, and is not the Black Swan. The math use to calculate the likelihood of these two events occurring at the same time is faulty. The MTBF metric for hard drives is measured in 'soft failures'; this is very different from a 'hard failure'.

    The difference between the two types of failures is that a soft failure, while a serious error, is something that the controlling operating system can work around if it detects it. It is extremely unlikely that a hard drive will exhibit a hard failure without having several soft failures first. It is even more unlikely that two drives in the same array will exhibit a hard failure within the length of time it takes to rebuild the array. In my experience, it is more likely that the software controlling the array will run into a bug rebuilding the array. I've seen this with several consumer-grade RAID controllers.

    The true Black Swan is when a disk in the array catches fire, or does something equally as destructive to the entire array.

    To echo other people's points, RAID increases availability, but only an off-site backup solves the data retention problem.

"Gotcha, you snot-necked weenies!" -- Post Bros. Comics

Working...