Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Data Storage IT

HDDs Typically Failed in Under 3 Years in Backblaze Study of 17,155 Failed Drives (arstechnica.com) 102

An anonymous reader shares a report: We recently covered a study by Secure Data Recovery, an HDD, SSD, and RAID data recovery company, of 2,007 defective hard disk drives it received. It found the average time before failure among those drives to be 2 years and 10 months. That seemed like a short life span, but considering the limited sample size and analysis in Secure Data Recovery's report, there was room for skepticism. Today, Backblaze, a backup and cloud storage company with a reputation for detailed HDD and SSD failure analysis, followed up Secure Data Recovery's report with its own research using a much larger data set. Among the 17,155 failed HDDs Backblaze examined, the average age at which the drives failed was 2 years and 6 months.

Backblaze arrived at this age by examining all of its failed drives and their respective power-on hours. The company recorded each drive's failure date, model, serial number, capacity, failure, and SMART raw value. The 17,155 drives examined include 72 different models and does not include failed boot drives, drives that had no SMART raw attribute data, or drives with out-of-bounds data. If Backblaze only looked at drives that it didn't use in its data centers anymore, there would be 3,379 drives across 35 models, and the average age of failure would be a bit longer at 2 years and 7 months. Backblaze said its results thus far "are consistent" with Secure Data Recovery's March findings. This is despite Backblaze currently using HDDs that are older than 2 years and 7 months.

This discussion has been archived. No new comments can be posted.

HDDs Typically Failed in Under 3 Years in Backblaze Study of 17,155 Failed Drives

Comments Filter:
  • I get a consistent 10 years in server applications. Regardless of brand.
    • came here to say that.
      have multiple drives that are currently 7 years old.... no issues..constant power on

      • by ArchieBunker ( 132337 ) on Thursday May 04, 2023 @05:20PM (#63497852)

        Cool, now buy 17,000 and let me know the stats.

        • Cool, now buy 17,000 and let me know the stats.

          Can you not make a mistake for one minute? Good! Now keep doing that over and over correctly and you can’t make a mistake for eternity.

        • While my experience is the same as the experience from previous posters (and a sample set of 200+ drives over the years), it is still nothing compared to the amounts of what BackBlaze uses.

          But I do have an inkling on how those drives are used at BackBlaze. Even though drives in my care are subjected to a lot of reads/writes during a period of 24 hours a day, my drives are in PC cases. Drives at BackBlaze are subjected to a more demanding workload and most, if not all, are mounted in rack cases where vibrati

        • BB's reports are always interesting, but always must be read with context.

          SKUs aren't brands. Client SKUs aren't enterprise SKUs.

          Their workload isn't my workload or yours, nor are their QoS criteria or definition of failure.

          Their chassis are distinctly not your chassis, and the drives in scope are a variety of ages mounted in a variety of ways.

          HDDs are the wrong choice all day every day in the first place.

      • came here to say that.
        have multiple drives that are currently 7 years old.... no issues..constant power on

        So you also don't understand what you're reading? The average age of failed drives is 2.7 years. You are saying you have drives that are 7 years old and are running. So you're saying you have a sample size of 0 relevant to the discussion because you have no failed drives.

        Average age of failed drives != Average age before drive failure.

        • Average age of the failed drives that the data was worth recovering (and paying a good price to do so) or average age of failed drive in the backblaze system

          In the first case (had data needing recovery) I'd say the sample would be "wrong"-ish due to other companies, etc. in the field, but then again that "more than 10 but less than 10%" thing may be coming into play (been long time since I'd taken stats)

          What I'd be interested in hearing is the production numbers for the drives and a breakdown by make/model

    • Probably virtualization.
      • Probably virtualization.
        Backblaze does backups on object storage. Virtualization would be superfluous for their use case but I don't know what their backend does. I do know other backup companies that compete with them and they are all straight forward non-atomic object store.

        I SRE on an object store that is about three times that size in $MANY data centers. I find that ambient temperature deltas are a useful but not large indicator of failure over time. The smaller the change, the longer the life. The temp

        • by Bongo ( 13261 )

          At first glance, I'm just wondering if it's one of those things where, if a drive makes it past three years, it was probably made better at the factory, and can have a much longer lifespan into a ripe old age.

          Much like average lifespan for people was low because so many didn't survive childhood.

          Or maybe it's nothing like that for HDDs.

    • by Anubis IV ( 1279820 ) on Thursday May 04, 2023 @04:50PM (#63497762)

      I get a consistent 10 years in server applications. Regardless of brand.

      It sounds like you're talking about the average across all drives. They're talking about the average across all failed drives within the survey window. Different metric altogether.

      • by Kazymyr ( 190114 )

        Correct. The report above is subject to selection bias.

        • Re: (Score:2, Interesting)

          by Anonymous Coward

          Meaning, this looks like three years is the head of the bathtub curve for HDDs. For home/hobby/historic use, the tail would be useful to know, too.

          • Meaning, this looks like three years is the head of the bathtub curve for HDDs. For home/hobby/historic use, the tail would be useful to know, too.

            Kind sir or madam, this is slashdot. The only users experiencing “bathtub curves” and “tail” at their home as a hobby defines the null pointer.

          • I'll let you know if any of my 10-4 year old drives fail.

            I like old drives because they're past infant mortality, and they're cheap.This goes for most other things, too.

        • I wonder if this means one could "age" drives before putting them to long term use?
          If it lasts over 3 years, it will probably last a LOT, then move them into array as "good long term drives".
        • Correct. The report above is subject to selection bias.

          No it's not. The above report is a report on a very specific metric. It is only "selection bias" if you are trying to use it to determine a different metric from what the report is about.

      • by richi ( 74551 )
        MOD PARENT UP.
    • I get a consistent 10 years in server applications. Regardless of brand.

      I don't see what this has to do with that. The summary doesn't say how many drives failed. Just that among those that did fail, they took an average of 2 years 10 months to do so.

    • Lets me guess, that is because your drives are older and more robust. I too have drives much older, but I have also experience high failure with more modern drives.
      • by bobby ( 109046 )

        Came here to say the same thing. My oldest drives (~30 years) still work, and over the many years I've had more failures the newer the drive is.

        FTFA:

        An analysis of 2,007 damaged or defective hard disk drives (HDDs) has led a data recovery firm to conclude that "in general, old drives seem more durable and resilient than new drives."

        And this sadly stands to reason: most companies want engineers to make things cheaper, not better. This constant push for more and more data storage in a given form-factor is not helping reliability.

        I wish drive makers would also continue to make much older (more reliable) drive designs. I'd like the option. I don't need 22TB, I need reliable.

        BTW, I read so

        • The drive in my Kaypro 10 still booted lat time I used it 5 years ago, so that would have made it over 30 years old too.
          I have not used it since I shifted, but I must haul it out again.
          • by bobby ( 109046 )

            That's so cool. I love things that work, no matter how old. But I have a '29 Model A Ford. Hasn't been run in many years, but it will again, hopefully soon.

            I'd love to know the brand and model of your Kaypro's HD.

            • by yanyan ( 302849 )
              Here are SMART stats of my 250 GB, 5400 RPM Hitachi Travelstar.

              Vendor Specific SMART Attributes with Thresholds:
              ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
              1 Raw_Read_Error_Rate 0x000b 100 100 062 Pre-fail Always - 0
              2 Throughput_Performance 0x0005
            • >29 model A

              Is that the one with the rubber band transmission?

              • by bobby ( 109046 )

                ??? Never heard of that. Not sure what that means! :o

                No, it's got a fairly "normal" in-line 4 cylinder engine, clutch, 3 speed manual trans. The trans does not have syncro-mesh rings, so you have to "double-clutch", or just kind of wait between shifts, else the gears "clash". Gentle careful driving + shifting is key.

                It has an electric starter motor, but an optional hand crank if needed.

                Brakes are not hydraulic- but levers and pull-rods with fairly normal drum brakes.

                2 headlights + running lights, but on

        • BTW, I read somewhere, maybe here on /., that Apple-branded drives are more reliable. The few I've had over the years have never failed. That goes back 8+ years, so I'm not sure if that's still true.

          From my ex-boss, who went to work for a large semiconductor manufacturer who sold some power-control chips to Apple, I do know that Apple has a tremendously stringent Qualification and Approval process for Vendors.

          So, that probably accounts for most of the difference. In my 40-some years owning and maintaining Apple computers with spinning-rust drives, that I have only had one semi-catastrophic drive failure; on a Bondi Blue iMac back about 24 years ago.

          Anecdotes, I know; but, in my long experience with App

          • by bobby ( 109046 )

            Anecdotes count! Thanks. I doubt anyone has true full population stats on various drive reliability. My anecdotal experience: first computer I got, late '80s, had a Seagate ST225, and yup, it failed within weeks. Ever since then I've had far more Seagate failures than any other brand. Although WD has a bad reputation, I've had much better luck with them. I'm sad that so many good brands have gone out of business. It's the very ugly downside of capitalism. (cheap crap wins out)

            I'm admin for a small hos

            • by La Gris ( 531858 )

              I think there is a survivor bias.

              Those older drives you have from older generations: Many of this generation are long dead already, and a good proportion of those batches probably also died prematurely.

              The few survivors still running with pride, are those fortunate to having been built with the specs and run within favorable conditions.

              Industrial fabrication allow some deviation margins. Tested units can pass validation ok while having hidden issues that will shorten their life or reduce their resilience on

            • Some drives are more than 20 years old, running 24/7 for at least the past 15 years. Hitachi, IBM, HP, I forget what else, all SCIS / SCA. One system has a Seagate 40 GB PATA that is a freak. It just won't die, and has very fast speed for its age and type.

              One interesting thing: Before HGST (Hitachi) got bought by the evil Western Digital (who's Caviar Black drives are the only decent thing they have made), those Hitachi, HP and IBM drives were all designed and built by. . . HGST!!!

              So, your list of "Survivors" actually reads: HGST, HGST, HGST and the one geriatric Seagate (whose SCSI drives Apple Private-Labeled for years back in the day). So, the list actually becomes pretty easy to parse. . .

              Fortunately, even though I don't particularly like WD as a company

              • by bobby ( 109046 )

                First I encountered an SMR drive was about 5 years ago at a little company I do some work for. Accounting computer, that I share, hard drive would make an r-r-r-r sound about every 10 or 20 seconds, no matter what. Yup, things pretty much freeze until it's done its dance. It's a 1TB- I can't remember the brand. Maybe Toshiba? Of course, you look up the model number and you can't find anything saying it's SMR. Time for a shingles vaccine!

                Long story short- a particular computer (dearly departed mom's) t

                • First I encountered an SMR drive was about 5 years ago at a little company I do some work for. Accounting computer, that I share, hard drive would make an r-r-r-r sound about every 10 or 20 seconds, no matter what. Yup, things pretty much freeze until it's done its dance. It's a 1TB- I can't remember the brand. Maybe Toshiba? Of course, you look up the model number and you can't find anything saying it's SMR. Time for a shingles vaccine!

                  Long story short- a particular computer (dearly departed mom's) that would get used many hours, then shut down, kept killing hard drives. Recovered data from them, but very difficult. Computer was down near (but not on) a floor which was a concrete slab on dirt. Mom always complained that the room always felt damp. After a few drives, I decided the rule would be to leave the thing on 24/7, and drive never failed. I surmised that as it cooled, it sucked in damp air. Maybe a little desiccant pack taped over the vent hole would have solved the problem.

                  SMR, right, duh!!! Brain fart, Sorry!

                  Another thing I remember from BackBlaze Reports, is that HDDs H-A-T-E Humidity! Far, far more than scalding, relentless, baking heat; far, far more than continuous random seeks; is the dreaded Humidity!!!

                  So, you independently, correctly analyzed a possible cause, and proposed a simple experiment/solution. Bravo!

                  I've had a couple of drives die due to stiction- no oil, just enough humidity to cause gluing. Very carefully bump drive rotationally will sometimes free the platters. On one drive the head broke off the arm. I swear I was gentle.

                  Yikes!

                  I didn't know WD still made HGST-designed drives. Any clue which ones they are? Or all out of production now?

                  I haven't had the need to look for a couple of years (setting up a friend's huge Synology NAS); so my info is out of date. Helium drives are almost def. HGS

      • Yep, same thing here. Newer drives in my case have not lasted near as long as my old drives. I have 10+ year old Western Digital Blacks that just simply refuse to die. I have newer Western Digital Reds (in my NAS') die quite regularly within 3 years. My NAS drives are the only place I still use spinning rust until SSD (if this ever happens) becomes more affordable.
      • I can relate to the modern drives seeming to have higher failure rates.
    • by aardvarkjoe ( 156801 ) on Thursday May 04, 2023 @04:53PM (#63497774)

      Secure Data Recovery's March 8 post broke down the HDDs it received for data recovery by engineer-verified "power-on hours," or the total amount of time the drive was functional, starting from when its owner began using it and ending when the device arrived at Secure Data Recovery.

      They're pretty obviously not examining a subset of drives that are representative of all hard drives. They're only looking at drives that failed (so leaving out all drives that are retired without failing) and also only examining ones that are sent to this company for data recovery.

      There might be some useful data to gather from this, but the conclusion from the Ars Technica headline "HDD average life span misses 3-year mark" is obvious nonsense.

      • Thanks. This is, at best, how long an average drive lasts before failure if it fails during normal lifespan

        Not sure why it is useful except to let server farm managers breathe easier after 3 years go by.

        • Thanks. This is, at best, how long an average drive lasts before failure if it fails during normal lifespan

          Not sure why it is useful except to let server farm managers breathe easier after 3 years go by.

          Yes, this. The slashdot title and summary are bogus. It's obvious from the Backblaze data where the worst drive models have annual failure rates of 7%. So the average lifetime of the worst models is way more than 3 years.

          There's also a reason why the average lifetime of failed drives is so close to 3 years. If the warranty period, i.e., the period for a free replacement, is 3 years, then Backblaze will try its hardest to find iffy drives before the warranty period expires. So, I would expect that among

      • As best I can tell by the math, for a small failure rate before retirement the average lifetime to failure is determined by the retirement time. In such a small region, the slope of exponential decay is almost constant, so if the retirement time is five years, half the of drives that fail will fail before 2.5 years and the other half between 2.5 and 5 years, for an average lifetime of 2.5 years regardless of whether the failed fraction at the end of 5 years is 1%, 2%, or 3%.

        So lifetime of failed drives sa

      • There might be some useful data to gather from this, but the conclusion from the Ars Technica headline "HDD average life span misses 3-year mark" is obvious nonsense.

        An anecdotal statistic I heard is that most hardware component failures happen in the first 6 months.

        I think the useful takeaway from this is that even if your HDD has made it a couple of years you're still in the "oh, it might suddenly bonk window"

        Interestingly, this about fits my Samsung SSD which died a few months short of 3 years.

        On the bright side that was well within their 3 year warranty.

        On the down side that warranty wasn't worth the corrupted bytes it was recorded on [reddit.com]*.

        * After much pointless back an

    • by suutar ( 1860506 )

      My guess is that since they're talking about average failure age and not average lifetime, they're excluding drives that lasted long enough to get replaced without having failed (maybe the RAID had a failure but it's not cost effective to get a replacement for a drive that old, so the still-running drives get retired after the data's transferred)

    • Sounds like you need to start a cloud backup service then.

    • My first significant customer is still using the 80GB drive I gave them to backup their database onto 18 years ago. I don't remember the brand, but I don't think it matters. Aren't all hard drives manufactured by the same company that just slaps different labels on them depending on the customer company?

      • How often do they perform those backups? If daily, these drives sit idle for >90% of the power-on time. That's an hour backup daily, and being generous with the actual percentage.

        If less frequent, they are idle even more. Datacenter drives might be active much more.

        And if your server examples, out there, are business apps, they ,ight find most of their work during business hours. More details would allow us to either make better assessments, or tuck tail and scurry away.

    • by Anonymous Coward
      Note that they're reporting the average age of failed drives without actually revealing the average age of their entire fleet. Their AFR (Annual Failure Rate) is 1.4%, so there are a ton of drives still running quite happily.
    • by SirSpanksALot ( 7630868 ) on Thursday May 04, 2023 @05:11PM (#63497826)
      The metric here isn't described well. What the metric actually is is this: Of the drives that have failed, average time to drive failure is 2 years 7 months. The vast majority of their drives last a lot longer than this, and are retired before they fail - so don't affect the metric.
      • The metric here isn't described well.

        I disagree. The only thing that isn't running well here is people's reading comprehension. There's only so many times you can use the word "Failed Drives" in a title and summary before you need to stop accusing the wording and start accusing the idiots who can't read.

    • by edwdig ( 47888 )

      This is just stats on the drives that failed. 2% of their drives fail. They usually replace drives after 6-7 years because they're no longer efficient to use. Generally drives fail either in the first few years, or they last a long time.

    • by Luckyo ( 1726890 )

      Survivorship bias. They're talking about hard drives that died. You're talking about ones that made it.

      P.S. I have similar experience with personal hard drives. My parents who get my hand down gaming rig when I get a new one still run two old 100GB seagates. Power on times on those are over a decade. And that's with way more power cycling than a server hard drive. In my experience, if a hard drive fails, it's usually within first three years. After that, it just keeps on trucking almost forever in most case

      • The summary talks like they only cared about the drives that failed. The actual article is entirely different and makes it absolutely clear that it is about "drive days" and calculated from that the annualized failure rates, which are quite low.

        Only true journalists possess the skill to process an article that loudly says "1% of most drive types fail per year" into a summary "drives typically fail after two and half years".

        Yes, we are missing data on how long Backblaze is operating their drives on average,

    • by aergern ( 127031 )

      I'm sure you read/write as much data to your drives regardless of brand. ;)

    • What they are doing wrong is selection bias, the do not get hard drives that don't fail at all and are just thrown out because they are too old.

    • by vlad30 ( 44644 )
      On my sample size about 500 I found if they survived 2-3 years they tend to work until i have no use for them 10 years and up depending on size and interface. Since they are talking about failed drives that makes sense I'd like to know the brands and types and age of the ones that survived and the reason they were eventually removed
    • The summary leaves out the most critical info from the actual article: the annualized failure rates of most of their drives are below 1%. A failure risk of 1% and lower per drive can easily mean operating a smaller batch for years without encountering a single failure.

    • No you are doing something wrong, specifically reading comprehension. You're confusing the average age of failed drives, with the average age of drives before failure. They are two VERY different metrics.

      Now go get all the drives you have which did fail and average how long they ran for.

    • All drives in the sample were failed drives for data recovery, it has to be related to sold drives to give useful data.

      All we have is average time to fail when there's a fail.

      A lot of disks are decommissioned before fail due to storage limitations or modernization, but the hours of use are rarely logged then.

    • by gweihir ( 88907 )

      I have a 3-way RAID1 that is always on. Current drives are 2 x 2.5" notebook drives (!) and one SSD that have been working fine for about 5 years. But before I had to throw out 4 other drives that all had various problems within less than 2 years. It really is a case of YMMV and some drives live forever when you really do not expect them to while others you had high hopes for may develop problems early.

      Yes, I am well aware that 2.5" notebook drives are _not_ what you want to use for this application, but th

  • by Retired Chemist ( 5039029 ) on Thursday May 04, 2023 @04:55PM (#63497782)
    What percentage of their drives failed in the study period. All I can conclude from the data is that drive's that fail do so on the average after three years. It tells me nothing about the actual average lifespan of a drive. Maybe a percentage fail quickly and the rest happily run on forever. Also, it is worth noting that their drives are probably working a much higher percentage of the time than one in a typical laptop. All in all, this does not really tell me anything useful.
    • by edwdig ( 47888 )

      They release very detailed drive stats regularly - I think quarterly. They break down failure rates by model, but most drives are around 1% failure rate, +/- 0.5% or so. This particular report is just about the drives that failed.

    • it is worth noting that their drives are probably working a much higher percentage of the time than one in a typical laptop

      That's probably not true. Backblaze is a cloud backup service. They write the data to the drive, and then it sits there untouched for log periods. Your laptop on the other hand is constantly reading/writing/deleting stuff incurring more read/write cycles than a backblaze drive would likely get.

    • by AmiMoJo ( 196126 )

      That information may be more valuable than you think.

      Having a single drive with no backup is a great way to lose your data. You should expect drives to fail and plan for it. Therefore, what you really want to know is

      1) How long you want the warranty to be.
      2) If warranty replacement drives have high failure rates.
      3) How long you can expect your investment to last.
      4) Are used drives a good buy?
      5) What is the probability of two drives in a batch failing at the same time?
      etc.

      That last one is important when deci

    • There's no critical missing data point. You are after a completely different metric subject to a completely different report. Go to back blaze's website and get the HDD reliability report for that. Don't try to turn this into something it is not.

      It tells me nothing about the actual average lifespan of a drive.

      It's not trying to.

  • by GotNoRice ( 7207988 ) on Thursday May 04, 2023 @04:58PM (#63497790)

    The study only evaluated drives that *already* failed. It's not taking into account drives that never failed. This doesn't mean that "Under 3 years" is the average lifespan of a hard drive. It means that, of the drives that failed, most lasted almost 3 years before they failed.

    If they they were simply trying to determine the average lifespan of a hard drive, and took into account drives that had not failed yet, it would be a LOT longer than 3 years

    • See also, bathtub curve.

      https://en.wikipedia.org/wiki/... [wikipedia.org]

    • by AmiMoJo ( 196126 )

      What I want to know is if I have a drive replaced under warranty, how long will the replacement one last?

      Often the replacement drive is a refurbished one. Manufacturers won't say what the refurbishment process entails, but I imagine they mostly reduce the drive capacity and allocate the now unused parts as spare blocks.

      Do such drives have accelerated failure rates? Maybe they don't completely die, but will some of my data become unreadable from a block that went bad and was reallocated?

      • At some point in time I had a Windows Server 2003 machine running an Oracle DB server. S.M.A.R.T. kept telling me the drive was 100% healthy, yet it would fail in very vague manners occasionally. Then I used he tool MHDD on that drive and this tool showed me that there were a lot of slow and bad sectors. These occurred only in the first 80 GByte of the drive. So the data partition was reduced by 100GByte, moved the boot partition into that freed up storage space and the server worked flawlessly for 5 more y

        • by AmiMoJo ( 196126 )

          I too find SMART to be inadequate. I have yet to find anything good that runs under Windows.

  • Remember that averages are misleading. Check the table: https://www.backblaze.com/blog... [backblaze.com]

    Some drive models lasted for over 6 years with some of the lowest failure rates.

    • It's also skewed by decommission age. If all HD are replaced after 5years (made up number) you will get a different average failure age than if you run them until they all fail.
      • I don't have a sample size nearly as large as Backblaze, but over the past 15 years, I've owned 14 hard drives... 2 failed (~2y, ~5y, the 5y being a boot drive), and the remaining 12 are between 1y and 15y old, with most being 8-10. My uses are similar to Backblaze; apart from the boot drive and the download drive, data just gets written once then read every so often. It's a decent enough sample to get an impression IMO. Hard drives from earlier in my life weren't kept around much more than 5y, but the ones
  • Honestly surprised that even people in the comments here seem to not quite grasp this.

    It's basically saying that if a drive fails, it's most likely to be within three years. The little dig at the end of TFS about most of the company's drives being older than 3 years just shows how badly this study's results are being misunderstood.

    To put it another way, it's like a study that shows that students who have shit their pants during school typically do so by 5th grade. It doesn't logically follow that we should

    • by Tablizer ( 95088 )

      > To put it another way, it's like a study that shows that students who have shit their pants during school typically do so by 5th grade. It doesn't logically follow that we should be concerned about a class of 10th graders being in imminent danger of pantsshitting

      In my personal experience, it spiked back up during my dating years.

  • putting consumer grade SD cards in anything is a mistake I found. So many raspberry pis that fail to boot after 2 years. I have a few SD cards that are not even recognized anymore. Some that report bizarre capacity. A few that get super hot if you plug them in, and are unreadable. Most of these are pretty common brands (SanDisk, Samsung) and not fake/clones. Basically an SD card is meant to last to the 1 year warranty. If you are doing things like keeping your daily work journal in it, where you write a tex

    • by King_TJ ( 85913 )

      Oh, without a doubt. Honestly, I view SD cards and USB thumb drives as equivalent to the old 3.5" floppy disks when it comes to reliability. You might have surprisingly good luck storing some data on one, putting it away for years, and finding you can get the data back off of it after that. But use it for many reads and writes, and it'll probably wear out on you after a little while.

      I think as capacities increased but prices per MB dropped, they all got to be pretty poor quality. I've run a FreeNAS (well, n

      • by jabuzz ( 182671 )

        It's the writing that does for them. You can read from them without affecting their lifespan. I have used USB drives for /boot on Linux when doing software RAID for over a decade and a quality flash drive will last the lifetime of the hardware. So by new hardware, install RHEL6 then 10 years later decommission and it's still fine despite installing all the updates in that time.

    • by kriston ( 7886 )

      Once I upgraded my SBC power supplies to 3.5 amps this problem went away completely.

      Failed SD Cards in SBCs is all about the quality of power supplied to them.

      • What also helps is to have high endurance or industrial SD cards. Both have more pages set aside to handle write cycles of cameras and other items. So far, I have had good luck with these. "Plain" SD cards just don't have the spare write cycles.

        I also underprovision, ensuring that the OS takes a fraction of the SD card space, and I also ensure that the OS does a TRIM on the card on a periodic basis. This has kept the amount of dead SD cards to a thankfully low number.

        Of course, the best solution is to b

      • I have rock solid power supplies for mine, I check the ripple with a scope because I don't have a lot of faith in a switching power supply that weighs almost nothing. Mostly the RPi2 has done its job well as an authoritative DNS server, but the SD cards would blow up in it about every 2 to 4 years.

        • by kriston ( 7886 )

          I used to run a lab with CompactFlash cards. We did everything we could to reduce the writes on these cards. It's fairly easy to do in UNIX. Configure syslog to write to /dev/null or /dev/console. Disable logrotated. Mount most volumes as read-only except those that absolutely must be read/write. Put /tmp and /var/tmp onto a ramdisk.

          It was fun and I learned how much unnecessary logging is done even on embedded systems whose logs would never be read by anyone.

  • It's true reliability has greatly gone down but not much. Most 8+ TB drives show a 2% or lower failure rate, the HSGT 12TB is only 0.27% failure rate and they have 2600 of those drives so a fair sample size. https://www.backblaze.com/blog... [backblaze.com]
  • by laughingskeptic ( 1004414 ) on Thursday May 04, 2023 @05:44PM (#63497914)
    For multiple sets of ~1,000 drives that ran 24x7, we experienced roughly 1 drive failure a year for the first three years. At 7 years, we were experiencing a failure every week. If you have a lot of drives, you don't want them to get to be 7 years old -- even a 5% annual failure rate becomes a lot of work.
  • Maybe they aren't keeping the drives at a constant and reasonable temperature?

  • Use a new drive only for unimportant data for its first three years. If it survives, start using it for important data.

  • ....If it lasts more than 3 years it will last a lot longer

  • It's almost like the manufacturers know something that we don't. There is a nonzero probability that manufacturers have conducted accelerated aging studies and know how durable drives are.
  • The way I'm reading this, the entire sample is made up of *drives that have failed*. So what they are saying is that "Out of drives that have failed, the average time they failed was at 3 years". But surely that says absolutely nothing about drives in general, as the sample doesn't include *drives that haven't failed*. It's kind of like the opposite of survivor bias, right? Or have I misunderstood the survey?

    • That's what I was thinking, too, but if the data is over enough years, the one statistic will approach the other once all drives put into service over 100 years ago will have failed.

      Note that the overall failure rate is only a couple of percent a year.

  • BackBlaze notes that the drives that failed sooner were also cheaper so may nevertheless be cost-effective. Probably the right statistic to look at is dollars/byte lost in failure per year. Still excludes overhead of dealing with failed drives.

  • 2 mechanical hard drives that are now 11 years old. One is Seagate, the other is Western Digital (I think).
  • I run a small data center that uses about 3 dozen hard drives. The average age of those drives is about 8 years. I seldom see a failure in a years time (It's been about a year since the last failure).

    Perhaps the drives they see are ones with some kind of manufacturing defect. I think looking only at failed drives is giving a biased view.

"It's the best thing since professional golfers on 'ludes." -- Rick Obidiah

Working...