Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Why RAID 5 Stops Working In 2009

Posted by kdawson on Tue Oct 21, 2008 06:03 PM
from the back-'em-up-rawhide dept.
Lally Singh recommends a ZDNet piece predicting the imminent demise of RAID 5, noting that increasing storage and non-decreasing probability of disk failure will collide in a year or so. This reader adds, "Apparently, RAID 6 isn't far behind. I'll keep the ZFS plug short. Go ZFS. There, that was it." "Disk drive capacities double every 18-24 months. We have 1 TB drives now, and in 2009 we'll have 2 TB drives. With a 7-drive RAID 5 disk failure, you'll have 6 remaining 2 TB drives. As the RAID controller is busily reading through those 6 disks to reconstruct the data from the failed drive, it is almost certain it will see an [unrecoverable read error]. So the read fails ... The message 'we can't read this RAID volume' travels up the chain of command until an error message is presented on the screen. 12 TB of your carefully protected — you thought! — data is gone. Oh, you didn't back it up to tape? Bummer!"
+ -
story

Related Stories

This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • 12 TB of your carefully protected â" you thought! â" data is gone. Oh, you didn't back it up to tape? Bummer!

    If it wasn't backed up to an offsite location, then it wasn't carefully protected.

    • Yea, because we all backup 12TB of home data to an offsite location. Mine is my private evil island, and I've bioengineered flying death monkeys to carry the tapes for me. They make 11 trips a day. I'm hoping for 12 trips with the next generation of monkeys, but they're starting to want coffee breaks.

      I'm sorry, but I'm getting seriously tired of people looking down from the pedestal of how it "ought" to be done, how you do it at work, how you would do it if you had 20k to blow on a backup solution, and trying to apply that to the home user. Even the tape comment in the summary is horseshit, because even exceptionally savvy home users are not going to pay for a tape drive and enough tapes to archive serious data, more less handle shipping the backups offsite professionally.

      This is serious news. As it stands, the home user that actually sets up a RAID 5 raid is in the top percentile for actually giving a crap about home data. Once that becomes a non-issue, then the point has come when a reasonable backup is out of reach of 99% of private individuals. This, at the same time as more and more people are actually needing a decent solution.

      • Oh come on. Do you have 12TB of home data? Seriously? And if you do, it's not that hard to have another another 12TB of external USB drives at some relatives place.

        I've got about 500GB of data that I care about at home & the whole lot's backed up onto a terrabyte external HDD at my Dad's. It's not that hard.

        If you think raid is protecting your data, you're crazy.

        • by DrVxD (184537) on Tuesday October 21 2008, @06:51PM (#25461967) Homepage Journal

          Oh come on. Do you have 12TB of home data? Seriously? And if you do, it's not that hard to have another another 12TB of external USB drives at some relatives place.

          Not all of us have relatives, you insensitive...[URE]

            • by Facegarden (967477) on Tuesday October 21 2008, @07:18PM (#25462245)

              Buying a computer system you cannot afford to properly use is crazy. Yes, some people are crazy, and those crazy people are going to lose data, but there's no sense in defending it.

              Well, i guess i'm crazy, i have 3TB of space on my home PC, and no way to back it all up offsite. I do have some important folders from one drive automatically copy to another drive periodically, so if one drive dies the other will be okay, but if i lose them both or the place burns down or i get a nasty virus, it's all going to hell.
              Most of my space is taken up by pirated... err... backed up... HD movies. And porn, lots of porn.
              Either way, i'm not too worried if i lose that, it's just the things i back up i really care about.
              The thing is, i was going to RAID 3 of the drives into a secure 1TB array, but now i hear all these issues with RAID and i worry that it may be WORSE than just copying over the files periodically. I want a DROBO but those are expensive as hell.

              This article has inspired me to look into Tape Backup but i worry that it's not cost effective (i haven't looked yet).

              I should fill up some tapes with a few hundred gigs of porn, write "confidential" on them, and stash them in a bag, under some bush, across the street from HP near my apartment. I'm sure some curious person would come looking, only to discover their contents and wonder why the hell someone went to all that trouble....

              God i'm strange.
              -Taylor

      • by sholsinger (1131365) <sholsinger@gmail.com> on Tuesday October 21 2008, @06:49PM (#25461953) Homepage

        Next they'll want to unionize. At that point you've lost everything.

        • Yea, but DVD is transient crap. How long will those last? A few years? You cannot rely on home-burned optical media for long term storage, and while burning 12 terabytes of information on to one set of 1446 dvds (double layer) may not seem like a big deal, having to do it every three years for the rest of your life is bound to get old.

          For any serious storage you need magnetic media, and though we all hate tape, 5 year old tape is about a million times more reliable than a hard drive that hasn't been plugged in in 5 years.

          So either you need tape in the sort of quantity that the private user cannot justify, or you're going to have to spring for a hefty RAID and arrange for another one like it as a backup. Offsite if you're lucky, but it's probably just going to be out in your garage/basement/tool shed.

          Now, what do you do if you can't rely on RAID? No other storage is as reliable and cheap as the hard drive. ZFS and RAID-Z may solve the problem, but they may not...You can still have failures, and as hard disk sizes increase, the amount of data jeopardized by a single failure increases as well.

          • by Hadlock (143607) <chad.hedstrom @ g mail.com> on Tuesday October 21 2008, @07:11PM (#25462171) Homepage Journal

            I can't vouch for DVD-R but I have el-cheapo store brand CD-Rs that I backed up my MP3 collection to 11 years ago and they work just fine. My solution is this:
             
            Back everything up that's not media (mp3/video) every 6 months to CD-R, and once a year, copy all my old data onto a new hard drive that's 20+% larger than the one I bought last year and unplug the old one. I have 11 old hard drives sitting in the closet should I ever need that data, and the likelihood of a hard drive failing in the first year (after the first 30 days) is phenomenally low. Any document that I CAN'T lose between now and the next CD-R backup goes on a thumb drive or it's own CD-R and/or email it to myself.

            • by SatanicPuppy (611928) * <Satanicpuppy AT gmail DOT com> on Tuesday October 21 2008, @07:10PM (#25462165) Journal

              I've got a mainframe circa 1984 that's been using the same type of drive since 1989. Last year we pulled all the year-end financial numbers off the yearly backups dating back to that point. Zero failed tapes.

              Consumer-grade CDs and DVDs use a photosensitive dye to record information. It can degrade in anywhere between 2 to 5 years...Longer if you keep it in a cool dark place, but not 20 years.

                • by kimvette (919543) on Tuesday October 21 2008, @10:40PM (#25464145) Homepage

                  I have CD-Rs dating back to 1994 or 1995 that are just fine -- and they're off-brand media too. "Good" media was $12 to $20 per CD then, and "cheap" media was $7.00 per CD.

                  I have DVD-Rs dating back to 2002 or 2003 -- again, just fine.

                  While it's good to be cautious, some in here are crying wolf regarding optical media.

    • You get your first RAID controller from a trusted friend. "Here" he says "try this" and hands you a Mylex board. It has a 64 bit bus and 3 SCSI LVD connectors. Oooh. That looks fast. So you start ebaying drives, cables, adapters, more controllers, the inevitable megawatt power supply and you mess around with raid 1, raid 0 raid 1+0 and raid 5. Suddenly every system falls prey to RAIDMANIA; eventually for yourself you build a system with 3 controllers, with 3 busses each and a drive on each one of 9 busses. With a controller for swap, one for data and one for the system will Windows now be fast? Yeah, sorta. Those drives sure are quiet - from a click-click busy noise perspective, NOT from a "sounds liks a jet airplane when running" perspective. Heat is an issue, too.

      http://rs79.vrx.net/works/photoblog/2005/Sep/15/DSCF0007s.jpg [vrx.net]

      But oh my are the failure modes spectacular.

      I just use a laptop now and make several sets of backup DVDs or just copy to spare drives. I love RAID to death. But it's really only marginally worth the effort in the real world. But if you need fast, OMG.

        • by binarylarry (1338699) on Tuesday October 21 2008, @07:43PM (#25462487)

          That's why serious IT people use Fedex.

          • by dbIII (701233) on Wednesday October 22 2008, @03:07AM (#25465439)
            I'll tell you that I was pretty serious when Fedex put a forklift tine through the front of a server they were shipping.
            • by techess (1322623) on Wednesday October 22 2008, @08:04AM (#25466981)

              I always love it when Fed-Ex destroys something and then tries to hide it. One day I walked past the shipping office and I smelled the very strong odor of hydraulic oil coming from the room. I take a look inside since we shouldn't be receiving anything that has hydraulic oil in it. I found a bunch of boxes with the local Detroit Airport logo all over them and sealed with DET labeled tape. The cardboard was completely soaked through with the oil.

              I carefully opened one of the boxes and found it contained servers! It appears that the original boxes got in some sort of accident at the airport and were completely soaked. At the airport Fed-Ex or the baggage handlers did us a "favor" and re-boxed everything. The servers were so coated (and filled) that even the new boxes were completely soaked through and the bottoms of the boxes were starting to pull apart. The Fe-Ex guy (so we wouldn't refuse them) dropped them off at lunch and then got some random person in the hall to sign off on it.

              We had to pay for new servers to be built ASAP and shipped overnight (UPS this time) at huge cost for us. Since someone had signed off on the package we then had a very long fight to get Fed-Ex to pay for the equipment they destroyed. We never got the extra cost for the overnight shipping and the rush build reimbursed.

        • Re:Don't panic! (Score:5, Insightful)

          by Anonymous Coward on Tuesday October 21 2008, @08:47PM (#25463119)

          No, it won't. That's the point of this not-news article. It's getting to the point where (due to the size of the disks) a rebuild takes longer than the statistically "safe" window between individual disk failures. Two disks kick it in the same timeframe (the chance of which increases as you add disks) and you're screwed.

          A poorly designed multi-disk storage system can easily be worse than a single disk.

          • Re:Don't panic! (Score:5, Insightful)

            by bstone (145356) * on Tuesday October 21 2008, @10:35PM (#25464087)

            Using the same failure rate figures as the article, you WILL get an unrecoverable read error each and every time you back up your 12 TB of data. You will be able to recover from the single block failure because of the RAID 5 setup.

            With that kind of error rate, drive manufacturers will be forced to design to higher standards, they won't be able to sell drives that fail at that rate.

          • Re:Don't panic! (Score:5, Insightful)

            by Eivind (15695) <eivindorama@gmail.com> on Wednesday October 22 2008, @01:05AM (#25464929) Homepage

            Yes. It's amazing that the article presents the basic point so horribly poorly. The problem is not the capacity of the disks.

            The problem is that the capacity has been growing faster than the transfer-bandwith. Thus it takes a longer and longer time to read (or write) a complete disk. This gives a larger window for double-failure.

            Simple as that.

            • Re:Don't panic! (Score:5, Insightful)

              by Sillygates (967271) on Tuesday October 21 2008, @11:07PM (#25464333) Homepage Journal
              The mathematical theory behind raid5 is not complicated at all. http://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_5 [wikipedia.org]

              And there is parity, that's how raid5 works.

              You are probably referring to "silent" errors, which for performance reasons, isn't read/detected by most raid5 implementations. And in reality there is little reason to actively read parity, unless they are running/recovering in degraded mode: Sure, you'll be informed that there is data corruption, but there is no way to tell whether the parity, or the original data is at fault (though its true, some implementations will scrub/update the parity to match the original data on an occasional basis).

              I don't see a single set of raid5 disks as a backup solution at any measure though (disk reliability is only one aspect of this, hardware/driver/filesystem bugs can also cause hard or impossible to detect corruption), but it is a great 'best effort' to prevent a bit of downtime on high availability disks.
        • Re:Don't panic! (Score:5, Informative)

          by nine-times (778537) <nine.times@gmail.com> on Tuesday October 21 2008, @10:04PM (#25463823) Homepage

          How reliable RAID5 is depends, because actually the more disks you have, the greater the likelihood that one of them will fail in any set period of time. So obviously if you have a RAID 0 of lots of disks, then there is a much better chance that the RAID will fail than that any particular disk will fail.

          So the purpose of RAID5 is not so much to make it orders of magnitude more reliable than just having a single disk, but rather to mitigate the increased risk that would come from having a RAID0. So you'd have to calculate, for the number of disks and the failure rate of any particular drive, what are the chances of having 2 drives fail at the same time (given a certain response rate to drive failure). If you have enough drives and a slow enough response to disk failures, it's at least theoretically possible (I haven't done the math) that a single drive is safer.

          • Re:Don't panic! (Score:5, Insightful)

            by Allador (537449) on Wednesday October 22 2008, @12:59AM (#25464903)

            You seem to misunderstand the article. They are saying that if you need 12T of storage RAID 5 is not reliable. You would be better off with a single 12T disk if such a thing existed.

            Thats not what the article says at all.

            The article says that if you build your RAID arrays from the biggest disks available (which no one with half a brain does) like 1-3TB drives, and you have them filled, then the numbers come out as presented.

            But there's a reason why no one on the planet builds important raid arrays out of 1TB drives. Rebuild time is too long.

            This is also one of the big reasons why you see so many 73GB and 140GB SAS/SATA drives in raid arrays, and why server storage drives dont grow anything like as fast as consumer garbage drives.

        • by jaxtherat (1165473) on Tuesday October 21 2008, @09:41PM (#25463595) Homepage

          I love how you use the language "get what they deserve".

          What about my situation, where I have to store ~ 1TB of unique data per office in 3 offices that are roughly 1000 km apart and I have to keep everything backed up with a budget of less than ~AU$ 4000 IN TOTAL?

          I have to run a 4 x 1TB RAID arrays on the file servers and use rsync to synchronise all the data between the offices nightly "effectively" doing offsites, and have a 3 TB linux NAS (also using RAID 5) for incrementals at the main site.

          That is all I can afford, and I feel that I'm doing my best for my employer given my budget and still maintaining my professional integrity as a sysad.

          Why do I "get what they deserve" when I can't afford the necessary LTO4 drives, servers and tapes (I worked it out I'd need ~ AU$ 30,000) to do it any other way?

          • by totally bogus dude (1040246) on Tuesday October 21 2008, @10:37PM (#25464117)

            If you're replicating data between all three offices (and a fourth backup system?) then you are making backups. The vitriol is aimed at people who set up a RAID-5 array and then say "hooray my data is protected forevermore!".

            Tape systems, especially high capacity tapes, are very expensive, and even those are prone to failures. Online backups to other hard drives are the only affordable means of backing up today's high capacity, low cost hard drives. To do it properly though, you need to make sure you do have separate physical locations for protection from natural disasters, fires, etc. Which you have.

            The only concern your system may have is: how do you handle corrupted data, or user error? If you've got a TB of data at each site it's unlikely that mistakes will be noticed quickly, so after the nightly synchronisation all your backups will now have the corrupt data and when someone realises in a month's time that someone deleted a file they shouldn't have or saved crap data over a file, how do you restore it? Hopefully your incremental backups can be used to recover the most recent good copy of the data, but how long do you keep those for?

            • by jaxtherat (1165473) on Tuesday October 21 2008, @10:48PM (#25464193) Homepage

              Judging by the budget you quoted, it's a combination of all of the above: you are a crappy sysadmin for a crappy company with limited growth potential.

              Sigh. *ignores flamebait*

              Anyway, here's the actual reality of the situation:

              I'm a not brilliant (but certainly not crappy either) sysad who is working for a company that has rapidly expanded to the point where they need a full time sysad, and then felt the kaboom of the subprime mortgage debacle, since they consult to the property market. Hence why my original upgrade budget got shrunk big time.

              The company BOTH cares about their data AND can't afford a proper backup system.

              • The company BOTH cares about their data AND can't afford a proper backup system.

                In this case, linux has one last resort for you:
                sudo apt-get install bible

                darkpixel@hoth:~$ bible
                bible: Debian/BRS Release 4.18, $Date: 2005/01/23 11:29:22 $
                Hit '?' for help.

                -snip-

                bible(KJV) [Gen1:1]> ec3:6

                Ecclesiastes 3

                6 A time to get, and a time to lose; a time to keep, and a time to cast away;
                bible(KJV) [Ec3:6]>


                Mainly pay attention to that whole '...and a time to lose' part.

              • by tengu1sd (797240) on Tuesday October 21 2008, @11:19PM (#25464395)
                >>>The company BOTH cares about their data AND can't afford a proper backup system.

                It can be that the company cares, but doesn't care enough to budget for potential data recovery. All you can do is to make sure the risks are explained, with budget option and well documented paper trail is cover your nether regions. Been there, done that. The typical response is that backups are not important, until a failure and a few days of uncertainty is forced upon the company.

                Having the same, potentially corrupted, data at multiple sites mitigates against the loss of a disk, or even the loss of a single site. User error or database corruption can wind up copied over your good data. Needing to go back for more than a day or two can may not be practical in a disk to disk backup environment.

                It's a part of system manager's role to spell out potential problems in easy to understand power point sound bytes and show what options are available. The better you can do this, the more toys you'll have to play with.

        • by ajkst1 (630286) on Tuesday October 21 2008, @10:03PM (#25463817)
          I have to echo this comment. RAID is not a backup. It is a form of redundancy. Nothing is stopping that system from losing two drives and completely losing your data. RAID simply allows you to keep working after a SINGLE disk failure. If you're not making backups of your critical data and relying on RAID to save your behind, you're insane.
  • by EdIII (1114411) * on Tuesday October 21 2008, @06:22PM (#25461661)

    I can see a lot of people getting into a tizzy over this. The RAID 5 this guy is talking about is controlled by one STUPID controller.

    There are a lot of methods, and patented technology that prevent just the situation he is talking about. Here is just one example:

    PerfectRAID(TM) is Promise's patented RAID data protection technology; a suite of data protection and redundancy features built into every Promise RAID product.

            *
                Predictive Data Migration (PDM): Replace un-healthy disk member in array and keep array on normal status during the data transition between healthy HD and replaced HD.
            *
                Bad Sector Mapping and Media Patrol: These features scan the system's drive media to ensure that even bad physical drives do not impact data availability
            *
                Array Error Recovery: Data recovery from bad sector or failed HD for redundant RAID
            *
                RAID 5/6 inconsistent data Prevent (Write Hole Table)
            *
                Data content Error Prevent (Read/Write Check Table)
            *
                Physical Drive Error Recovery
            *
                SMART support
            *
                Hard/Soft Reset to recover HD from bad status.
            *
                HD Powercontrol to recover HD from hung status.
            * NVRAM event logging

    RAID is not perfect, not by any stretch, but if you use it properly it will serve it's purpose quite nicely. If your data is that critical, having it on a single raid is ill advised anyways. If you are talking about databases, then RAID 10 is more preferable and replicating the databases across multiple sites, even more so.

  • Smells Like FUD. (Score:5, Insightful)

    by sexconker (1179573) on Tuesday October 21 2008, @06:25PM (#25461697)

    What is this article about?

    They say that since there is more data, you're more likely to encounter problems during a rebuild.

    The issue isn't with RAID, it's with the file system. Use larger blocks/sectors.

    Losing all of your data requires you to have a shitty RAID controller. A decent one will reconstruct what it can.

    The odds of you encountering a physical issue increases as capacity increases, and decreases as reliability increases. In theory, the 1 TB and up drives are pretty reliable. Anything worth protecting should be on server-grade hard drives anyway.

    The likelihood of a physical problem popping up during your rebuild is no higher with new drives than it was with old drives. I haven't noticed my larger drives failing at higher rates than my older, smaller drives. I haven't heard of them failing at higher rates.

    Remember, folks, RAID is a redundant array of inexpensive disks. The purpose of RAID is to be fault-tolerant, in the sense that a few failures don't put you out of production. You also get the nice bonus of being able to lump a bunch of drives together to get a larger total capacity.

    RAID is not a backup solution.

    RAID 5 and RAID 6, specifically, are still viable solutions for most setups. If you want more reliability, go with RAID 1+0, RAID 5+0, whatever.

    Choosing the right RAID level has always depended on your needs, setup, budget, and priorities.

    Smells like FUD.

  • by mbone (558574) on Tuesday October 21 2008, @06:25PM (#25461707)

    How many times does this have to be said.

    RAID is not a backup. RAID is designed to protect against hardware failures. It can also increase your I/O speed, which is more important in some cases. Backups are different.

    Depending on what you are doing, you may or may need a RAID, but you definitely need backups.

  • by gweihir (88907) on Tuesday October 21 2008, @06:58PM (#25462041)

    My observed error rate with about 4TB of storage is much, much lower. I did run a full surface scan every 15 days for two years and did not have a single read error in about two years. (The hardware has since been decomissioned and replace dby 5 RAID6 Arrays with 4TB each.)

    So, I did read roughly 100 times 4TB. That is 400TB = 3.2 * 10^15 bits with 0 errors. That does not take into account normal read from the disks, which should be substantially more.

  • Scrub your arrays (Score:5, Interesting)

    by macemoneta (154740) on Tuesday October 21 2008, @07:32PM (#25462407)

    This is why you scrub your RAID arrays once a week. If you're using software RAID on Linux, for example:

    echo check > /sys/block/md0/md/sync_action

    The above will scrub array md0 and initiate sector reallocation if needed. You do this while you have redundancy so the bad data can be recovered. Over time, weak sectors get reallocated from the spare bands, and when you do have a failure the probability of a secondary failure is very low over the interval needed for drive replacement.

    Most non-crap hardware controllers also provide this function. Read the documentation.

  • by fortapocalypse (1231686) on Tuesday October 21 2008, @08:02PM (#25462669)

    RAID???!!! Aaaaaaah! (Drive dies.)

  • I'm convinced. (Score:5, Interesting)

    by m.dillon (147925) on Tuesday October 21 2008, @08:26PM (#25462921) Homepage

    I have to say, the ZFS folks have convinced me. There are simply too many places where bit rot can creep in these days even when the drive itself is perfect. The fact that the drive is not perfect just puts a big exclamation point on the issue. Add other problems into the fray, such as phantom writes (which have also been demonstrated to occur), and it gets very scary very quickly.

    I don't agree with ZFS's race-to-root block updating scheme for filesystem integrity but I do agree with the necessity of not completely trusting the block storage subsystem and of building checks into the filesystem data structures themselves.

    Even more specifically, if one is managing very large amounts of data one needs a way to validate that the filesystem contains what it is supposed to contain. It simply isn't possible to do that with storage-system logic. The filesystem itself must contain sufficient information to make validation possible. The filesystem itself must contain CRCs and hierarchical validation mechanisms to have a proper end-to-end check. I plan on making some adjustments to HAMMER to fix some holes in validation checking that I missed in the first round.

    -Matt

  • The Black Swan (Score:5, Interesting)

    by jschmerge (228731) on Tuesday October 21 2008, @10:29PM (#25464053) Homepage

    A Black Swan is an event that is highly improbably, but statistically probable.

    Yes, it is possible for a drive in a RAID 5 array to become absolutely inoperable, and for one of the other drives to have a read failure at the same time. This is highly unlikely though, and is not the Black Swan. The math use to calculate the likelihood of these two events occurring at the same time is faulty. The MTBF metric for hard drives is measured in 'soft failures'; this is very different from a 'hard failure'.

    The difference between the two types of failures is that a soft failure, while a serious error, is something that the controlling operating system can work around if it detects it. It is extremely unlikely that a hard drive will exhibit a hard failure without having several soft failures first. It is even more unlikely that two drives in the same array will exhibit a hard failure within the length of time it takes to rebuild the array. In my experience, it is more likely that the software controlling the array will run into a bug rebuilding the array. I've seen this with several consumer-grade RAID controllers.

    The true Black Swan is when a disk in the array catches fire, or does something equally as destructive to the entire array.

    To echo other people's points, RAID increases availability, but only an off-site backup solves the data retention problem.

    • by SatanicPuppy (611928) * <Satanicpuppy AT gmail DOT com> on Tuesday October 21 2008, @06:25PM (#25461693) Journal

      The real issue is one that anyone who has ever had to recover a multi-drive array can tell you instantly: if one drive fails, and the other drive was bought at the same time, and has had a nearly identical usage pattern, the odds of the other drive failing are well above average.

      I once had a single drive fail in a 24 disk array. The disks were arranged, RAID 5, in groups of 3, glued together by Veritas (from back before it got bought by crappy symantec). By the time the smoke cleared we had replaced 19 out of 24 drives. They had all been bought at the same time, and as they thrashed rebuilding their failed buddies, they started dying themselves. The remaining 5 drives we replaced anyway, just because.

      That's a worst case, but multiple failures are far from uncommon, and very few people correctly cycle in new drives periodically to reduce the chance of a mass failure.

      • by Angus McNitt (542101) on Tuesday October 21 2008, @06:44PM (#25461893)

        ... very few people correctly cycle in new drives periodically to reduce the chance of a mass failure.

        That is also because very few people buy a Raid setup piecemeal. Most end up buying a solution, fully populated. The idea of swapping out some drives as you go, or growing your RAID over time doesn't always look good, either to the PHBs who usually run the budget, or to the vendor. We had a vendor trying to sell us a iSCSI SAN device tell us that varying the drive lots and dates increased the chances of failure. Needless to say we went elsewhere.

        When we bought the RAID array for our Exchange box, this is going back a few years, everybody looked at my like an idiot because I asked for drives with different lot numbers. It was the best I could do as buying over time was not an option. HP was actually pretty cool about this request and out of 8 disks, no 3 have the same lot number or manufacture date.

        Of course we are also running RAID on that machine for non-backup and do a nightly replication, so your mileage may vary.

      • Re:RAID != Backup (Score:5, Informative)

        by Walpurgiss (723989) on Tuesday October 21 2008, @07:03PM (#25462095)
        I run a raid5 with 1TB disks. Growing the array from 3 to 4 took around 4 hours, 4 to 5 took maybe 8 or 10, 5 to 7 took something like 30 hours I guess.

        But that's growing from a previous capacity to a larger capacity.
        Using mdadm to fake a failure by removing and adding a single drive, the recover time generally was 4-5 hours.
    • by pyite (140350) * on Tuesday October 21 2008, @08:07PM (#25462725)

      Wow. I love your FUD. If you're going to lie, at least make it seem truthful.

      Lacking in file system utilities (yes, fsck IS necessary even on healthy filesystems, especially on desktops and portables)

      Why no fsck? [opensolaris.org] And if you really feel the need to do something:

      zpool scrub <pool_name>

      License-incompatible with anything worth running it on, other than Solaris itself... which is NOT worth running (see #1 above)

      What you mean to say is "Some Operating Systems whose merits can be debated are license incompatible with the license of ZFS." FreeBSD can implement ZFS. Why can't Linux? Because of its license, not that of ZFS.