Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
Compare cell phone plans using Wirefly's innovative plan comparison tool ×
Data Storage Media

Backblaze Dishes On Drive Reliability In their 50k+ Disk Data Center 145

Online backup provider Backblaze runs hard drives from several manufacturers in its data center (56,224, they say, by the end of 2015), and as you'd expect, the company keeps its eye on how well they work. Yesterday they published a stats-heavy look at the performance, and especially the reliability, of all those drives, which makes fun reading, even if you're only running a drive or ten at home. One upshot: they buy a lot of Seagate drives. Why? A relevant observation from our Operations team on the Seagate drives is that they generally signal their impending failure via their SMART stats. Since we monitor several SMART stats, we are often warned of trouble before a pending failure and can take appropriate action. Drive failures from the other manufacturers appear to be less predictable via SMART stats.
This discussion has been archived. No new comments can be posted.

Backblaze Dishes On Drive Reliability In their 50k+ Disk Data Center

Comments Filter:
  • by damn_registrars ( 1103043 ) <damn.registrars@gmail.com> on Wednesday February 17, 2016 @02:09PM (#51528717) Homepage Journal
    Considering how awful their failure rates are in general, they need to get good at reporting them before hand or they (as a company) won't exist much longer. After all, investing in quality is clearly too expensive...
  • Around here, Seagate 6TB disks cost 50ish % more than WD Red NAS and Hitachi disks are yet more expensive. So all these graphs are basically in line with the old adage "You get what you pay for".

    The comment about Seagate's SMART being more on point seems to make those disks a nice compromise.

    Funny enough, considering there is this saying in Switzerland: "Sie geit oder sie geit ned." (where "Sie geit" sounds awfully close to "Seagate") which roughly translates to "It works or it doesn't" and is a stab at the

    • Funny enough, considering there is this saying in Switzerland: "Sie geit oder sie geit ned." (where "Sie geit" sounds awfully close to "Seagate") which roughly translates to "It works or it doesn't" and is a stab at the sometimes abysmal failure rates they had back when.

      Here in the USA, especially around the Monterey Bay Area where Seagate was (and still is) located, we just called them "Seizegate" for the tendency of their drives to fail due to stiction.

    • We have a rack-mountable QNAP NAS device that our field support people back up files to when they are rebuilding a workstation. We used 3T Seagates from the compatibility list in it, and I had constant problems; we've replaced them with WD Reds, and the problems have gone away. Now in retrospect, seeing that Seagate drives report SMART events earlier, it makes sense that I had all the problems. The QNAP firmware drops and refuses to reattach any disk to an mdadm array that has SMART errors. Granted, if
  • Sorry WD fans (Score:5, Interesting)

    by Solandri ( 704621 ) on Wednesday February 17, 2016 @02:21PM (#51528831)
    Can't help but feel for all the people who read Blackblaze's previous report and decided Seagate was junk and bought WD instead. I tried to warn them that the model of the drive mattered more than the manufacturer, because each manufacturer tries new technologies and new cost-cutting strategies with each different model. Sometimes it works and the model is reliable. Sometimes it doesn't and the model is unreliable. But everyone was eager to get on the bash Seagate, praise WD bandwagon and ignored me.

    Well, WD was least reliable this time around. The Seagate stats in the previous report were probably being skewed by just one or two bad models. It's skewed this time by one bad model, which due to the passage of time means it makes up a tiny portion of their Seagate sample, so doesn't spike Seagate's score like before. (You can pretty much ignore WD in the 4TB graph, as a sample size of just 46 drives means the confidence interval is a 0.3% - 8.8% failure rate.)

    At least Blackblaze addressed my criticism from before - they've broken down the stats to individual drive models. And you can see that like I said, there's huge variability in reliability between models within a manufacturer's lineup. Now they just need to add confidence interval to the graphs.
    • I wish I saw Backblaze's previous report. I have a whole lot of Seagate paperweights [dadatho.me]. I couldn't do anything but laugh when one of their SNs ended in FML [dadatho.me]

      In comparison all of the WD Red's that I bought to replace those (and their warrantied replacements) are still going strong. I did everything 'right'. Spread out my purchases, bought from Newegg and Amazon, kept them cool, etc. I think out of the 12 or so 2 & 3TB Seagate drives my current FreeNAS machine still has all of 1 or 2 still running. And one of

    • I don't know if I'd say it was 1 or 2 bad models that plagued seagate. When I buy drives, I go by the ratings on amazon and newegg, and regardless of the drive model it seems there's always a lot more reviews of seagate drives failing than other brands.

      • by Gondola ( 189182 )

        The problem with this tactic is that manufacturers will change their manufacturing methodology over time. An extremely well-reviewed model can be replaced later in its product life by a worse version that retains the same exact model number. If you go to NewEgg and Amazon and look at hard drive reviews for the best drives, then look at only the more recent reviews, you may see a big drop in the average rating for some models. Bait and switch. So, be careful!

    • by epine ( 68316 )

      Can't help but feel for all the people who read Backblaze's previous report and decided Seagate was junk and bought WD instead.

      Why feel for them? By your own inefficient market hypothesis, every course of action is a crap shoot. The report was great for me, because we actually had one or two of those highly suspect drives in service.

      But in the larger scheme, you're absolutely right. Every vendor has manufactured a few duds. IBM, Hitachi, Seagate, Western Digital. Every company has made some poor models

    • The 3TB Seagate (ST3000DM001) wasn't in the main table because it had a 28%/year failure rate and they've all been retired. It's not that they bought a small number of them - they ripped them out - I've been doing the same. The 4TB Seagate's have been about average in reliability.

  • by FirstOne ( 193462 ) on Wednesday February 17, 2016 @02:36PM (#51528969) Homepage

    ""When will your hard drive fail" [slashdot.org]

    I pointed out that Blackblaze chassis configuration improperly stressed the fragile SATA/Power connectors by implementing a vertical disk drive mounting configuration, [slashdot.org].
    Where the mass of drive(&vibration) is placed upon the fragile SATA data and power connectors.

    This type of vertical drive storage/raid cabinet is not conducive for long term/reliable drive lifespan., thus any number of other factors could kick in and cause a premature failure.

    • Re: (Score:3, Insightful)

      by Anonymous Coward

      Considering they are hitting 5-6 years on a decent population of their drives I think they are doing OK.

  • I'm impressed by the HGST drives, less than 1% failure rate. I haven't touched the Deskstar line of drives since the IBM Deathstar debacle, but I think it's time to take a second look. Hopefully they have not switched over to Western Digital's technology.

    • by tlhIngan ( 30335 )

      I'm impressed by the HGST drives, less than 1% failure rate. I haven't touched the Deskstar line of drives since the IBM Deathstar debacle, but I think it's time to take a second look. Hopefully they have not switched over to Western Digital's technology.

      Well, HGST drives are still more expensive than Seagate or WD drives of similar capacity.

      Remember a hard drive is a very high precision mechanical device that has traditional economic pressures applied to them - everyone wants more for less dollars. So the

  • Bad sectors? (Score:5, Interesting)

    by nbritton ( 823086 ) on Wednesday February 17, 2016 @03:00PM (#51529253)

    What is Backblaze doing to check the drives for bad sectors? I manage a 10,000 disk openstack swift installation and I've noticed the auto sector remapping doesn't work correctly, there are a portion of drives (maybe 3%) that have a few bad sectors that need to be manually remapped using ddrescue. I ended up having to write a custom monthly cron job script that ran badblocks to first identify these drives, and then ddrescue to force a sector remap.

    • It may be different with 10,000 disks vs 4 disks, but I wouldn't trust a drive once it has one remapped (or pending remap) sector. I'd be worrying about replacing it, not remapping, because it tends to be a sign of impending failure.

      • It may be different with 10,000 disks vs 4 disks, but I wouldn't trust a drive once it has one remapped (or pending remap) sector. I'd be worrying about replacing it, not remapping, because it tends to be a sign of impending failure.

        Of the drives with sector errors (n = 286) the number of bad sectors typically ranged from 4 to 16, with a median of 8. However, values above 25 bad sectors were statistical outliers, meaning they were more than 3 standard deviations off the normal curve. Our policy now is to replace any drive with more than 25 bad sectors.

  • by FlyHelicopters ( 1540845 ) on Wednesday February 17, 2016 @03:19PM (#51529429)

    All things fail, including hard drives. The question isn't "if", it is "when".

    Picking between WD or Seagate hoping to get a "good drive" is missing the point, what happens when both drives fail?

    Do you have your data backed up?

    I run both Crashplan and Backblaze, I also have a copy stored on Amazon Glacier and important files on OneDrive. I also have two external drives that I rotate backups on and keep unplugged.

    For most people, what I do is "overkill", but I've lost data before... never again...

    • I think the point here isn't that there's a drive or manufacturer out there that doesn't fail. The point here is that with such a huge sample range, you can make somewhat useful trends and comparisons between failure rates on a macro scale that no standard user would be able to do themselves. If you look at 56,000 disks and see that Seagate accounts for a larger percentage of drives and lower equivalent failure rate among manufacturers, you can *generally* expect that buying a drive of an equivalent model a
      • While those are fair points, and good advice... I still have a concern...

        I don't think there is a large enough disclaimer that Backblaze runs their equipment in a 24/7 environment that is quite different than most users. Oh sure, they say it and it is there, but I think it deserves highlighting.

        If you look at the percentage failure rates, they are higher across the board than what I've seen. Sure, drives fail, but honestly I have some of those same Seagate drives in a server here and they have been runni

        • The data might be from more rigorous conditions, but that doesn't make it useless. If a drive model exhibits a low failure rate even under supposedly awful conditions, then that reflects even better on the drive. If anything, I'd be more concerned about ways in which their environment is better than a typical consumer environment, such as how a forced-airflow server in a temperature-controlled datacenter is probably going to keep the drives at a better (or at least more consistent) temperature than some ran
        • by drsmithy ( 35869 )

          I suspect Backblaze is quite hard on drives and the rates are worse than you'd see outside of that environment. It is also worth noting that those drives are not all installed in the same type of "pod". Backblaze has changed pod designs a few times and now uses an "anti-vibration" system they didn't used to.

          Your typical home desktop/server drive is likely to see a far harsher life than your average Backblaze drive.

  • We actually didn’t retire these 1TB WD drives – they just changed jobs. We now use many of them to “burn-in” Storage Pods once they are done being assembled. The 1TB size means the process runs quickly, but is still thorough. The burn-in process pounds the drives with reads and writes to exercise all the components of the system. In many ways this is much more taxing on the drives then life in an operational Storage Pod. Once the “burn-in” process is complete, the WD 1TB

  • by Fencepost ( 107992 ) on Wednesday February 17, 2016 @04:27PM (#51529955) Journal
    One of the significant notes is that it seems the Seagate 4TB drives are doing much better than some earlier versions, and that WD is no longer doing so well.

    Another thing that gets brought up every time one of these is released is "Why are they still using Seagate drives if they're so bad?" and the answer is simple: it remains a balancing act between cost and reliability. Backblaze has the redundancy and processes in place to not worry about single-drive failures, so FOR THEIR USAGE the lower drive cost is more important. If you're on a smaller setup where you have everything on just a few drives with inadequate redundancy, a few dollars extra for better reliability is worth the cost.

    When you really get down to it Backblaze is looking at cost per gigabyte per day, and if ($LESS_RELIABLE_DRIVE_COST + $DRIVE_REPLACEMENT_COST) is lower than ($MORE_RELIABLE_DRIVE_COST) then they're going with the cheaper option.
    • by AmiMoJo ( 196126 )

      For home use it's worth paying a little more of a Hitachi (HGST) drive. They are owned by WD, but use different tech, different factories etc. You pay more but get better reliability.

  • A: They're cheap
    B: They scream really loud before they die, hopefully when someone's listening.
    C: They're cheap.

    I'll stick with Western Digital and HGST.

    If they die off that infrequently in their sweatbox environments, the chances that they're going to die under normal desktop use are orders of magnitude less.

  • by dbIII ( 701233 ) on Wednesday February 17, 2016 @09:25PM (#51531833)
    Consider the conditions - this is selecting for the environment of a lot of drives packed into poorly ventilated cases so those that cope best with heat will win.
    While heat over time is a common cause of drive failure there are others, so the results are not so useful for drives in desktop cases or in well ventilated servers (eg. ones with hot-swap bays so there is no way to pack the drives in as densely as Backblaze do).
  • SMART monitoring is where modern OSes utterly fail, it should be a core part of OS functionality, the OS should warn you when a SMART stat goes bad but MS et al would rather put some stupid shopping experience into the OS instead.

"If you don't want your dog to have bad breath, do what I do: Pour a little Lavoris in the toilet." -- Comedian Jay Leno

Working...