Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Data Storage

Proposed Disk Array With 99.999% Availablity For 4 Years, Sans Maintenance 258

Thorfinn.au writes with this paper from four researchers (Jehan-François Pâris, Ahmed Amer, Darrell D. E. Long, and Thomas Schwarz, S. J.), with an interesting approach to long-term, fault-tolerant storage: As the prices of magnetic storage continue to decrease, the cost of replacing failed disks becomes increasingly dominated by the cost of the service call itself. We propose to eliminate these calls by building disk arrays that contain enough spare disks to operate without any human intervention during their whole lifetime. To evaluate the feasibility of this approach, we have simulated the behaviour of two-dimensional disk arrays with N parity disks and N(N – 1)/2 data disks under realistic failure and repair assumptions. Our conclusion is that having N(N + 1)/2 spare disks is more than enough to achieve a 99.999 percent probability of not losing data over four years. We observe that the same objectives cannot be reached with RAID level 6 organizations and would require RAID stripes that could tolerate triple disk failures.
This discussion has been archived. No new comments can be posted.

Proposed Disk Array With 99.999% Availablity For 4 Years, Sans Maintenance

Comments Filter:
  • I don't see power mentioned in the paper.
    • with any sense it would include it's own UPS to allow it to successfully write out to the discs all the pending writes and then spin down...
      • by Enry ( 630 )

        IIRC XFS/SGIs had this built in that there was just enough juice to flush buffers to disk while everything was spinning down.

      • Re:Power Costs (Score:5, Insightful)

        by jellomizer ( 103300 ) on Thursday January 29, 2015 @11:54AM (#48931711)

        Many high end equipment does have fairly large capacitors to allow enough power off time to do a clean power off.
        I remember back in the 1990's some PC Centric folks were looking in a Sun Workstation they were surprised about all the large capacitors that were on the motherboard. In short it gives the system enough time finish its final calculation before the power goes out.

    • How about a setup that detects when one more drive failure will cause the raid array to fail and spins up a new unused drive to be ready for that failure?

      --> Not a raid expert...
      • by jandrese ( 485 )
        The spares should be warm spares. Not spinning until the RAID controller detects a failure and replaces the failed drive. So they won't take any appreciable amount of power. The concern I have is space. That many idle drives eating up rack space is going to be expensive.
        • by TWX ( 665546 )
          For colocated space, yes.

          For an organization like the one I work for, with server room space to spare, it wouldn't be too bad. We could probably triple our rackspace dedicated to disk and still have room to spare, and we have the HVAC to match. That's kind of what happens when equipment gets more condensed and virtualization enters the fray. Can't virtualize a storage array obviously, but can replace the space that application servers took with storage as the space is freed up.
        • Well since they are not supposed to need to be hot swap you can get 12+ drive into a 1ru chassis with redundant power and a fairly beefy server. That is 3x the density of traditional 4 up front 1ru. Expanding to 2ru gives 12 hot swap 3.5's or 24 2.5 still 2x the density in 3.5's for non hot swap. Potentially even higher with 2.5's, though highest I find is 88 hot swaps in a 4ru or 22 per ru coupled with a rather beefy server.

      • by LWATCDR ( 28044 )

        Or how about having the array swap in spares.
        Every few weeks or so one of the spares could start to act as a mirror of an active drive and once that drive is mirrored you swap the active drive to the spare and the spare to the active?

    • Re:Power Costs (Score:4, Insightful)

      by Barny ( 103770 ) on Thursday January 29, 2015 @11:54AM (#48931717) Journal

      "More work is still needed to define policies that would allow array users and manufacturers to detect unusually disk failure rates and take the appropriate actions before any data loss takes place." - Last line in the conclusion.

      This implies that not all the spare drives are active and ready to go all the time and that some/most would be kept powered down as cold spares. Of course this same guy is likely to get another paper done where he examines the cost to run the array and how many drives could be left cold and still achieve the 5-9s reliability. Heck, if the software managing the drives is smart, it would rotate active/spare drives in and out, working them in quickly to get them all past the 'first 18 months high failure' rate to the sweet spot, then swap in and out over the lifespan of the array to enable the array to be at highest reliability for longer.

      Hrmm, maybe I should look at building such an algorithm, a quick google search doesn't turn any such systems up.

      • How do you figure? I mean sure, presumably the spares would be inactive until a replacement was needed, to save both power and wear and tear, but how do you figure that that is an implication of needing to detect anomalous failure rates to avoid data loss? No matter what strategy you're using, if you've got N-nines projected reliability over Y years assuming normal failure rates, then if you're suffering from anomalously high failure rates you're going to need to replace some drives early to maintain the

        • It seems that one assumption in the study is predictable or consistent failure rates or timing. This would make sense if the drives were all the same make/model/manufacturing dates, but if not, well, then the model changes and they would be needing more intelligence to deal with unpredictable failure rates and having to spin up cold spares at different rates, predicting failure.

          Which all makes a world of sense to me. When I hovered over Raid 5 arrays with cold spares, especially in NetWare servers where '

    • by mlts ( 1038732 )

      Cooling costs come to mind as well. SSDs are one thing, as they can be powered off and not used. However, HDDs have to be either spinning (which creates a lot of heat, especially at 10k+ RPMs that enterprise disks spin at), or spun up/down, and spinning enterprise disks up and down isn't good for them, and might even cause array faults unless the array firmware is designed to deal with it.

      There is also expense. If I have five hard disks worth of data, I need (5*4)/2, or ten HDDs by the OP's metrics. How

  • by jandrese ( 485 ) <kensama@vt.edu> on Thursday January 29, 2015 @11:37AM (#48931599) Homepage Journal
    So I tried to view the PDF, and it says "can't use the plugin, it causes problems on our server". So I figured I'd just download the file with wget instead. Nope, 403 forbidden.

    Looks like fetch works though. If anybody else has trouble getting the file, try my local mirror [ceyah.org].
    • by Nutria ( 679911 )

      it says "can't use the plugin, it causes problems on our server".

      The name of the browser and plugin would be helpful...

      (The PDF happens to work perfectly on Linux with the built-in viewers of FF35 and Chromium 39.)

      • by jandrese ( 485 )
        This was on Windows with Firefox and the Adobe plugin. I don't have the built-in plugin because I like popping out PDFs and because the built-in viewer is slow as balls on nontrivial PDFs.
    • So I tried to view the PDF, and it says "can't use the plugin, it causes problems on our server".

      Maybe they have problems with their disk array?

      But seriously, I had no problems downloading the document from the orginal site.

    • No problem viewing the PDF file in Safari on OS X.

  • by Enry ( 630 ) <enry.wayga@net> on Thursday January 29, 2015 @11:39AM (#48931615) Journal

    That's not long term. That's the normal life of a storage array. Long term is like 8-10 years.

    • by jandrese ( 485 )
      They only had availability data for 4 years of drive life. This is largely a math study. I'm not familiar with any implementations of their 2D parity system, although it is outside of my area of expertise. Their assumption that the service calls would always be more expensive seemed a little suspect to me. Rack space isn't free and when you have basically 100% redundancy or more in spare drives you're going to eat up a lot of space. Putting 54 spare drives in a rack that already has 11 parity disks and
      • by Enry ( 630 )

        All in all this smells like a mathematicians solution to the problem, largely unbounded by real life concerns.

        I had the same thought. There's a few realities of storage that are missed here: storage use always increases, disks aren't the only things that fail, rack space isn't free, you usually have staff available already....

        This is an interesting idea if your storage is in a place where it can't be reached at all for some reason, but I think NASA and ESA have already done a good bit of research on that.

  • Really, 4 year life span and they are replaced?

    God I need to work for a company like that!

      I am so tired of dealing with these RS/6000 systems that were made back in 1994, and these intel systems made back in 2002.

    • by ArcadeMan ( 2766669 ) on Thursday January 29, 2015 @12:02PM (#48931801)

      I am so tired of dealing with these RS/6000 systems that were made back in 1994, and these intel systems made back in 2002.

      Yeah, we get it. You like to deal with cutting-edge stuff. Now get off my lawn.

      Sent from my Commodore 64.

    • 4 years was my recommendation for disk replacements from about 198 onwards. Some arrays had drives >8 years old, but if failure was not tolerated, 4 years was enough.

      Mind you, if the customer specified IDE drives, I warned them that failure was inevitable. SCSI 10K drives, I would still swap but that was for five-nines.

      And those stupid IDE RAID cards, well, that's too cheap. We are no longer talking reliable. Let someone else have that business.

  • by raymorris ( 2726007 ) on Thursday January 29, 2015 @11:47AM (#48931665) Journal

    The bottom line is, having a lot of spare disks for a 2D array makes it reliable over time. These configurations of 2D arrays are quite reliable, over time because they have many spares available to automatically replaces failed disks:

    Data parity spare
    12 3 13
    12 3 14
    24 6 20
    36 9 26

    To understand the above table, we'll use the first row as an example. An array made up of 1TB disks 12TB of data space would have 3TB of parity and 13 spare 1TB drives, for a total of 28 drives to get 12 drives worth of net storage.

    What they didn't mention is that the same reliability can be achieved with only three spares, by replacing spares at your convenience. Replacing drives can be somewhat costly if it has to be done quickly, but if you can schedule to replace the failed drive "some time in the next two months", that probably won't be costly.

    • by Chas ( 5144 )

      Yes, but then you're dancing around the possibility of additional disk failures while waiting on that replacement.

      If you pop a few more drives (which, if you got your disks in lots is QUITE possible), you're in deep shit.

      • We do just that, when it gets down to 1 hot spare it's an emergency service and we replace all the failed units. This does not happen very often and tends to be just that a bad batch.

    • by tlhIngan ( 30335 )

      What they didn't mention is that the same reliability can be achieved with only three spares, by replacing spares at your convenience. Replacing drives can be somewhat costly if it has to be done quickly, but if you can schedule to replace the failed drive "some time in the next two months", that probably won't be costly.

      The goal is to realize that for manufacturers, service calls are expensive. Perhaps a company has a 4 hour response time - if a disk fails, the company is still running with redundancy, but

      • >. service calls are expensive. Perhaps a company has a 4 hour response time -

        Service calls are expensive BECAUSE it's an emergency. If you have four spares, plus the two parity drives, you're still six drives away from a problem. With a few spares, you can easily replace one by sending it UPS ground, rather than having a tech run out there immediately.

  • I worry a lot less about losing data than I do corrupting data and not knowing it.

    But hey, congratulations, you've learned about RAID mirrors with lots of copies and learned how to apply basic, well understood engineering principals to it.

    Guess what, some of us were aware of this years ago, some others aware of it longer than you've probably been alive. Its been known my entire life, thats for sure, so thats at least 40 years.

    • And lets add, to 'avoid maintenance' you just add a bunch of extra spares from the start. Thats just stupid, you over build ridiculously in order to not have to spend 10 minutes swapping a drive out. Totally cost effective ... if you're sending a probe out into space. In which case, you're going to want better than fives 9s, so try again.

    • http://www.dailywritingtips.co... [dailywritingtips.com]

  • by Kokuyo ( 549451 ) on Thursday January 29, 2015 @11:54AM (#48931719) Journal

    "Yeah, well just put more disks in it..."

    Nice idea. Only: TCO is not just based on initial spending and maintenance. There is also rackspace to consider and did I hear anyone talk about green IT?

    If my day to day considerations were that one dimensional, my employer could save a ton of money on my salary.

  • by fnj ( 64210 ) on Thursday January 29, 2015 @11:55AM (#48931725)

    We observe that the same objectives cannot be reached with RAID level 6 organizations

    Well, duh. RAID6 is not a serious level of redundancy. ZFS RAIDZ-3 (triple parity) FTW. And you can build in as many hot spares as you want. Dinosaurs who have still not adopted ZFS need to get a clue.

  • TL;DR version:
    Replacing disks sucks some times. Sticking in additional spares means you don't have to replace them. They calculated an efficient RAID solution that means you don't need as many spares.

  • Yeah, and what are you going to do with 9 out of 10 of the disks all go bad, because they came from the same factory run and exhibit the same issue? This is what we usually experience, when a disk fails, most of the time it's a subcomponent issue shared by all of the disks from that and any concurrent factory runs - and we have to swap them ALL out. I guess you just throw the whole array out ... :-(
    • If you read the article, that is exactly what they suggest. If failure rates are too far above predicted, they say to replace with new array. At least they are upfront about it.

  • by futuresheep ( 531366 ) on Thursday January 29, 2015 @12:07PM (#48931855) Journal

    Just a few things I thought of while looking at this study:

    The authors are using Backblaze data. Backblaze uses consumer grade SATA disk which isn't going to be as reliable as the Enterprise SATA/SAS disk we would use.

    I'm willing to bet that none of the authors of this paper have ever had to pay for colocated rack space, power, and cooling either, they've just doubled the RU that I need for storage. At $1500.00 - $2000.00 per rack that adds up.

    Doubling the rack space for storage I need so I can avoid a few service calls by my storage vendor over 5 years simply isn't efficient.

    We've installed close to 500TB of archival storage using commodity hardware and 2-3TB Nearline SAS. We have maybe 3 hand and eyes calls per year for disk replacement.

    Anyway - just rambling.

    • by fnj ( 64210 ) on Thursday January 29, 2015 @12:20PM (#48931939)

      consumer grade SATA disk which isn't going to be as reliable as the Enterprise SATA/SAS disk we would use

      In your fantasy there is a difference besides a hideously higher price and a somewhat longer warranty period. In real life, commodity SATA is much more cost effective. Everybody who is serious reognizes this (Google, Backblaze, Amazon).

    • Well you can probably double your density moving to non hot swap 3.5's, making double the drives even on space. Now if I were going to do that I would mirror the raid sets anyways since power consumption of near line drives is pretty minimal.

      Never seen much of a use of enterprise sata, I do use a lot of SAS with dual ports to separate raid controllers.

  • To last all of 4 years, and need nearly as many hot spares as data drives. I guess the academics think they know something yet again. They took some dubious failure rates (backblazes use whatever is the cheapest consumer drive at the time and eventually stop buying the really bad ones (seagate 1.5 and 3tb looking at you)) and a rather optimistic transfer rate (200MBS) that assume all sequential reads. They failed to account for back plane, controller, and power assuming that those never fail. By their nu

  • My understanding is that disks often fail when a head touches the surface, or a piece of dirt gets between the head and the surface. Once that happens, more dirt is produced, increasing the probability of more head crashes, leading to a failure cascade. As a consequence, once one of my drives starts to show unrecoverable errors, corresponding to damaged surface areas, I replace it while it can still be read.

    The spare platter strategy does nothing to reduce this failure mode. In fact, all modern disks alre

    • The spare platter strategy does nothing to reduce this failure mode. In fact, all modern disks already have spare space for bad block relocation.

      Including pretty much everything with an onboard controller. "Modern" is understating the case.

      If I were expecting an array to last a long time without being touched, I would expect it to have a whole bunch of spares that never even got heated up until they were needed, just sat there in the box enjoying living in a relatively temperature-constant environment. Sure, there's fluctuations, but they'll all be within the operating temperature range of the drives.

    • This from an NEC white paper in 2008:

      "A recent academic study [1] of 1.5 million HDDs in the NetApp database over a 32 month period found that 8.5% of SATA disks develop silent corruption. Some disk arrays run a background process to verify that the data and RAID parity match, a process which can catch these kinds of errors. However, the study also found that 13% of the errors are missed by the background verification process. When you put those statistics together, you find on average that 1 in 90 SATA dri

  • Trust (Score:5, Interesting)

    by HideyoshiJP ( 1392619 ) on Thursday January 29, 2015 @12:31PM (#48932037)
    I don't trust anybody who has published a document with the title "C:\Users\Jehan-Francois Paris\Documents\ADAPT15\Case3.doc." Not even in .docx format. Tsk tsk.
  • by roc97007 ( 608802 ) on Thursday January 29, 2015 @12:42PM (#48932105) Journal

    A service call? Seriously? A syadmin (or operator if it's a big place) can't see the yellow light on a disk and replace the pack with in-house spares? Have we become so inept as an IT community that we can no longer do a walk-through of our machine room and service simple things like this? Maybe we do deserve to be outsourced.

    And if one must have a service contract such that only the vendor can touch the hardware, (why would you do that? never mind) wouldn't you negotiate a provision that includes drive replacement (as drives are consumables that must eventually be replaced) without being charged for an "office visit"?

    • Yes we have, if the array is installed in your backup corporate PKI server, in a shielded and locked cage with video, electrostatic, and laser monitoring and alarms. And the keys to the cage are in another state. And it requires EVP approval to deliver the keys to the authorized tech for a flight to the DR site to change a failed drive.

      A real world example. You would recognize the name of this corporation in the first three letters. They take their corporate security very seriously, so much so that bum

  • zpool create -o ashift=12 -o autoreplace=on raidz2 sdc sdd sde sdf sdg sdh spare sdi sdj

    Alright, fine, ashift=12 is newer than 2009, for 2TB+ drives. And always use /dev/disk/by-id for your sanity.

  • Our conclusion is that having N(N + 1)/2 spare disks is more than enough to achieve a 99.999 percent probability of not losing data over four years.

    Instead of keeping the spares inside as just that — spares — can it not start using all of them (in a sufficiently redundant configuration) and gradually lose capacity as physical disks fail?

    Yes, it would require coordination with the driver and filesystem, but there is nothing insurmountable in that...

  • "We observe that the same objectives cannot be reached with RAID level 6 organizations and would require RAID stripes that could tolerate triple disk failures."

    That's true only if you assume that three disk failures occur faster than a single disk can be rebuilt.

    If you assume no more than two disk failures *during the length of time it takes to rebuild the array* then RAID 5 or RAID 6 works fine as long as you assign enough hot spares.

  • The number of drives seems to be large. The calculations are exponential therefore as the cluster gets bugger the number of spare disks get much bigger.

    Drives spares Total
    5, 15, 20
    10, 55, 65
    30, 465, 495

    That's a lot of disks. There is a point that space and power overcomes the human cost.

It's a naive, domestic operating system without any breeding, but I think you'll be amused by its presumption.

Working...