Forgot your password?
typodupeerror
Data Storage Stats Hardware

Hard Drive Reliability Study Flawed? 237

Posted by samzenpus
from the on-the-other-hand dept.
storagedude writes "A recent study of hard drive reliability by Backblaze was deeply flawed, according to Henry Newman, a longtime HPC storage consultant. Writing in Enterprise Storage Forum, Newman notes that the tested Seagate drives that had a high failure rate were either very old or had known issues. The study also failed to address manufacturer's specifications, drive burn-in and data reliability, among other issues. 'The oldest drive in the list is the Seagate Barracuda 1.5 TB drive from 2006. A drive that is almost 8 years old! Since it is well known in study after study that disk drives last about 5 years and no other drive is that old, I find it pretty disingenuous to leave out that information. Add to this that the Seagate 1.5 TB has a well-known problem that Seagate publicly admitted to, it is no surprise that these old drives are failing.'"
This discussion has been archived. No new comments can be posted.

Hard Drive Reliability Study Flawed?

Comments Filter:
  • by anagama (611277) <obamaisaneocon@nothingchanged.org> on Wednesday January 29, 2014 @09:10PM (#46106065) Homepage

    Is he saying that 1.5TB drives are all 5 years old? If you look at the table in TFA, it talks about "release date" -- which may well be some time ago, but I'm sure 1.5TB drives may had new, even if the design hasn't changed in a while.

  • Re:Meh. fud spam. (Score:5, Insightful)

    by icebike (68054) on Wednesday January 29, 2014 @09:16PM (#46106133)

    Someones working overtime to make seagate look good.
    But the pile of dead seagates at work says otherwise.

    Yeah, this guy is essentially saying the pre-known facts validate this research finding so therefore the research was deeply flawed.

    It really doesn't matter what the accumulated knowledge over the intervening years says, the facts remain that for this user, Blackblaze, the results were the results, and it happened to match what the industry already knew.

      Their results: Hitachi has the lowest overall failure rate (3.1% over three years). Western Digital has a slightly higher rate (5.2%), but the drives that fail tend to do so very early. Seagate drives fail much more often — 26.5% are dead by the three-year mark."

    If anything, this guy just validated Blackblaze's study,

  • by plebeian (910665) on Wednesday January 29, 2014 @09:26PM (#46106225)
    My understanding based upon reading the originally posted materials was that they published their reliability findings based upon their own experience. I did not see anywhere that they claimed that it was comprehensive research into the reliability of hard drives. We should not crap upon backblaze because people could not be bothered to read the articles and made some faulty assumptions based upon the headlines, to do so would just serve to dissuade others from releasing their experiences. As for the argument about some of the hardware having known faults... If a company does not want bad press they should do more quality control before releasing crappy hardware...
  • by viperidaenz (2515578) on Wednesday January 29, 2014 @09:37PM (#46106317)

    Install that drive in a server in an online backup company and see how long it lasts.

  • Re:Meh. fud spam. (Score:4, Insightful)

    by harlequinn (909271) on Wednesday January 29, 2014 @09:42PM (#46106339)

    Yes. They are getting cheaper and faster. They are already much faster than magnetic rotating discs in read/write/iops.

    Don't be facetious, you can't get a 1TB SSD for 100$ yet and you know this. The OP clearly wrote "getting cheaper", he didn't say they have parity on price.

    The reliability rate for current generation SSDs is now higher than traditional HDDs. So in regards to " run 24hours/24hours for 5 years without any problems ?", take your pick, they can all do it better than a traditional HDD.

    I think traditional HDDs have precious few years left.

  • by Anonymous Coward on Wednesday January 29, 2014 @10:02PM (#46106451)

    To continue a bit on his ridiculous rant of "what you should be doing if you release any data on your real-world experiences".

    1. The age of the drives as it affects the failure rate of the drive.

    Fair enough. Backblaze did this, in the average age metric. Is average the most complete one available? Of course not, but it certainly gives you a starting point.

    2. Whether the drives are burned in or not burned in, as it impacts the infant mortality.

    Backblaze has stated they perform drive burn-in testing before putting into production. A tiny amount of reading the other blog posts will show you this. Any company using drives in such a manner will do so.

    3. How much data will be written to and read from each drive time and if over time the drives in question will hit the limits on the hard error rates.

    Duh? All drives in backblaze's pool are generally subjected to "similar" write patterns I'd imagine. Does this author *really* think Backblaze has it out for Seagate and is writing only to those drives to make them fail earlier? What I care about is how long the drives last for my workload. If I know about when to expect a failure, all the better. Specifications are rarely more than a super conservative CYA from the vendor though, and most drives outlive their rating by many multiples.

    4. The load and unload cycles and if any of the failures exceed manufacturer specification.

    What? How is load/unload cycles remotely relevant to an on-line backup service? Has this guy ever ran anything at all at even close to this scale? You never let drives spin down - both for this cycle rating reason, and software raid in many ways does *not* play nicely with spun down disk. No one operating with 10's of thousands of drives is going to forget this small detail, as they will have inordinate drive failures across the board if they are cycling them constantly.

    5. Average age does not tell you anything, and statistics such as standard deviation should be provided.

    It tells me quite a bit. Is it as detailed as a scientific study should be? Of course not. This is not a scientific study, it's simply publishing real world data the company in question has experienced. If we're talking percentage differences, this metric will matter a lot. We're not. We're talking 3% to 25%. I don't need things broken down into standard deviation to know there is a big problem. If their intention was to mislead readers, then you might have a point. But I doubt they have something out for Seagate.

    6. Information on SMART data monitoring and if any of the drives had exceeded any of the SMART statistics before they are more likely to fail.

    Who cares? A failure is a failure. If I replace a drive due to an early SMART warning, I'm still replacing damned drive. It failed. How it failed or the manner it failed in is absolutely irrelevant to me.

    7. Information on vibration, heat or other environmental factors as it impacts groups of drives. Will a set of new drives from a vendor get put into an area of the data center that is hotter or have racks with more vibration?

    Has this guy ever worked in a datacenter? Or seen what Backblaze even does? There is enough scale here to make these factors inconsequential. We're talking dozens of racks, with many servers. Drives get put into identical chassis, and into identical racks. Will some racks have slightly higher inlet temps? Sure. But unless Backblaze is co-located in some ghetto colo somewhere this is an absolute non-issue. Drives are not nearly as temperature sensitive as the idiots on review sites would lead you to believe. Google published a report on this a long while back if you need scientific evidence of that fact.

    This would matter a lot if they were putting drives into different types of systems.

  • Re:Meh. fud spam. (Score:5, Insightful)

    by MatthiasF (1853064) on Wednesday January 29, 2014 @10:46PM (#46106711)
    He seems to be trying his best to find flaws in the study, but his own logic is pretty poor. For instance.

    "I’ve noted that we just found that the Seagate 1.5 TB drives are about 8 years old since release, for the failure rate, but the average age of the Seagate drives in use are 1.4 years old. Averages are pretty useless statistic, and if Seagate drives are so bad then why buy so many new drives?"

    If the company began rolling out Seagates for 3 years at 5k a year and stopped after three years because of the high failure rate, moving on to Hitachi and such, then the average age even over 8 years could very well be only 1.4 years. Because, let's face it, when it's your ass on the line and you see a particular type of drive putting your servers into a precarious state, you might start migrating away as fast as you can.

    Those Seagate drives still running are probably either running in very low IO servers or very low-risk servers (clustered or such), but in such few quantities that their continued lifespans are not increasing the overall average much. The remainder could be shelved to avoid the risk of failing in a critical system and while they are listed in the total number of drives purchased, their age might not be included in the average presented.
  • Re:In all fairness (Score:5, Insightful)

    by greg1104 (461138) <gsmith@gregsmith.com> on Wednesday January 29, 2014 @11:54PM (#46107087) Homepage

    One of the patterns I've noticed with Seagate is that drive failures seem to spike when manufacturing moves. The reliable Barracuda IV models made in Singapore were replaced by shoddy ones made by newer facilities in Thailand. Then around 2009-2010 they shifted a lot more manufacturing into China, and from that period the Thailand drives were now the more reliable ones from the mature facility. A lot of the troubled 1.5TB 7200.11 models came out of that, and perhaps some of your 500GB enterprise drives too.

    If you think about this in terms of individual plants being less reliable when new, that would explain why manufacturers go through cycles of good and bad. I think buying based on what's been good the previous few years is troublesome for a lot of reasons. From the perspective of the manufacturer, if a plant is above target in terms of reliability, it would be tempting to cut costs there. Similarly, it's the shoddy plants that are likely to be improved because returns are costing from there are costing too much. There's a few forces here that could revert reliability toward the mean, and if that happens buying the company that's been the best recently will give you the worst results.

    At this point I try to judge each drive model individually, rather than to assume any sort of brand reliability.

  • by girlintraining (1395911) on Thursday January 30, 2014 @01:51AM (#46107465)

    Is he saying that 1.5TB drives are all 5 years old? If you look at the table in TFA, it talks about "release date" -- which may well be some time ago, but I'm sure 1.5TB drives may had new, even if the design hasn't changed in a while.

    I think the takeaway here is this man is neither terribly detail-oriented nor well-suited for his line of work. Things like date of manufacture, make and model, I/O amount, number of power cycles, environment, etc., are all obvious things to record to an experienced IT person. He appears to have done very little of that. He is a bean counter pretending to be an engineer.

I am a computer. I am dumber than any human and smarter than any administrator.

Working...