Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Data Storage IT

Backblaze's Geriatric Hard Drives Kicked the Bucket More in 2023 (theregister.com) 51

Backblaze has published a report on hard drive failures for 2023, finding that rates increased during the year due to aging drives that it plans to upgrade. From a report: Backblaze, which focuses on cloud-based storage services, claims to have more than three exabytes of data storage under its management. As of the end of last year, the company monitored 270,222 hard drives used for data storage, some of which are excluded from the statistics because they are still being evaluated. That still left a collection of 269,756 hard drives comprised of 35 drive models. Statistics on SSDs used as boot drives are reported separately.

Backblaze found one drive model exhibited zero failures for all of 2023, the Seagate 8 TB ST8000NM000A. However, this came with the caveat that there are only 204 examples in service, and these were deployed only since Q3 2022, so have accumulated a limited number of drive days (total time operational). Nevertheless, as Backblaze's principal cloud storage evangelist Andy Klein pointed out: "Zero failures over 18 months is a nice start."

This discussion has been archived. No new comments can be posted.

Backblaze's Geriatric Hard Drives Kicked the Bucket More in 2023

Comments Filter:
  • by Anonymous Coward

    Let's be realistic here. Backblaze has had fantastical luck with the 204 examples of the Seagate 8TB drives. And, somewhere, there is a very unlucky customer that had all 204 of their drives fail during the preceding 18 months.

    • 14 months is also not that impressive considering some drives are going strong after 9 years
      • by drinkypoo ( 153816 ) <drink@hyperlogos.org> on Tuesday February 13, 2024 @01:30PM (#64236980) Homepage Journal

        14 months is also not that impressive considering some drives are going strong after 9 years

        Drives are rated in MTBF with good reason. What's interesting about Backblaze's studies, besides that they give the results away, is that they have enough disks to produce pretty good statistics on their own. What they say they are doing [backblaze.com] is logging SMART data and some other basic statistics daily, what they offer is CSV files.

        Unfortunately they don't provide separately a data dictionary or a sample CSV, so three-ish minutes later I've got the 964MB zip file of Q4 2023 logs and looking at the files I find I still have questions. What I want to know is stuff that you can't find out by looking at the SMART logs. They mention in their results discussion that there were drives that were over their specified temperature limits; I want to know what the enclosure temperature was like. I filtered out all the empty columns from a small sample of one of the log files, and came up with in total the following fields which are not about the disk itself:

        column 5 name datacenter length 4 alpha 4
        column 6 name cluster_id length 3 int 4
        column 7 name vault_id length 4 int 4
        column 8 name pod_id length 2 int 4
        column 9 name pod_slot_num length 4 int 4

        All of these tell you where the disk was located, but none of them tell you anything about what the environment was like in that location at the time. You only get to know what the disk said about its own temperature. I'd also like to know, for example, what the system thinks its 12v and 5v voltages were at the time. I have yet to look at the actual SMART stats, so maybe some disks are reporting both internal and PCB temperatures, but I'd like to know basically the output of lm-sensors. Hmm, which upon review, on my system doesn't show the 5V or 12V, just the 3V3 and the current core voltages. Bugger.

        • Yep, all the stuff that drinkypoo said about the impact of environmental factors. And add to that the impact of different manufacturing batches that are greatly amplified for small drive populations. That's why the models with less than 1000 drives can show AFRs like 0.00% or 24.29%.

          The idea of AFR is only useful at the bottom of the bathtub reliability curve when failure rates are more stable, so the early-life failures due to manufacturing problems need to be omitted.

        • by AmiMoJo ( 196126 )

          I find that SMART is pretty useless in most cases. Drives develop bad blocks and remap them. SMART decreases the count of spare blocks, but my data is lost.

          Now do I toss the drive because it might be developing a bad spot, or hope it was just one small part of the surface? Do I run a full surface scan and verify the integrity of every file?

          It's got to the point where you basically need two drives to stand any chance of not losing data, and ideally a proper backup. Or yeah, RAID5 or ZFS or something. It's a

          • I use ZFS with mirrors. My system SSD is mirrored to another SSD, my HDD is mirrored to an external HDD, and I have a mirrored SLOG for the HDD mirror. Now I just need to buy another external HDD for offsite backup... I used to have one, but then I got bigger HDDs.

            • by AmiMoJo ( 196126 )

              I don't bother with mirrors, I'm not too worried about getting back up and running instantly. I can re-install and have all my settings backed up. It's stuff like photos and videos I'm more concerned about. Other stuff is small enough that I have duplicate copies on disk and in cloud storage.

              My issue with ZFS is that you can't really expand it easily, at least not is a good way. I can't just throw another drive in. UnRAID lets you do that, but I'd prefer something open source.

              For now I use MultiPAR to creat

              • My issue with ZFS is that you can't really expand it easily, at least not is a good way. I can't just throw another drive in.

                Yes it's true, raidz expansion has been in the works for literally years and still isn't there. On the other hand disk capacity tends to go up so much that there is relatively little need for it.

                For now I use MultiPAR to create parity data for files and store it separately, so that a few bad blocks won't be fatal.

                That makes sense, but since I'm using mirror and not raidz all I need to do is add another disk to the mirror, then remove it after resilvering completes, and I get a new checksummed copy of my data for offline archive. I think next time I'm out I'll stop by costco and pick up another disk, so I can do that. Then I'

                • by AmiMoJo ( 196126 )

                  On the other hand disk capacity tends to go up so much that there is relatively little need for it.

                  It's nice to be able to add one drive at a time, instead of having to do it in batches and then ending up with a bunch of smaller drives that are spare.

                  Then I'll just take the disk to work and keep it there...

                  One of the downsides to working from home.

                  I keep an eye out for external LTO6 tape drives. The tapes are relatively cheap, but all the drives seem to be SAS only. I'm not sure a USB to SAS converter would work, and they aren't cheap anyway. My current NAS only has one PCIe slot and it's in use...

          • Or yeah, RAID5 or ZFS or something. It's a lot of hassle just to avoid randomly losing data.

            Remind me again what the "I" in "RAID" stands for?

            Oh yeah, "inexpensive".

            • by AmiMoJo ( 196126 )

              I always thought it was "independent", but Wikipedia gives both definitions.

              I'm thinking I might try SnapRAID, since it makes everything very easy and platform agnostic. It's a shame you can't ask for a certain percentage of parity data like you can with PAR files, but my understanding is that that would require massive amounts of memory.

  • Old hard drives are failing!
    Film [youtube.com] at 11!
    • Sunspots [superuser.com]

    • That's hugely under-stating the value of this data. The linked article lists failure rate broken down by make, model, and age. That is valuable data. So much so that I wonder why they give it away.
      • by necro81 ( 917438 )

        That's hugely under-stating the value of this data. The linked article lists failure rate broken down by make, model, and age.

        I do not disparage the dataset, nor the openness that Backblaze champions by sharing it. Instead, I quibble with the lame headline generated by The Register, unoriginally copied verbatim by the submitter, and blindly accepted by the editors.

  • Binning? (Score:5, Insightful)

    by Diss Champ ( 934796 ) on Tuesday February 13, 2024 @12:36PM (#64236864)

    I'm wondering if we have reached the point (or will reach a point) where vendors selling to Backblaze and others that post these kinds of numbers are incentivized that if they do any sort of production binning they ship the best stuff from a forward reliability indicator prospective regardless of how its labeled to them.

    • No, there's no way they're binning. What they're more likely doing is manufacturing drives specifically for BackBlaze that use more reliable designs and components and slapping the same label on them.

    • Depends on how Backblaze sources their drives. If would be easiest if Backblaze purchased directly from a manufacturer like Seagate. But if they get them from a supplier or middleman, it would be harder for Seagate to bin them. Also some of the drives are not specifically made for enterprise server use. For example, the Seagate ST4000DM000 was a desktop drive so Backblaze could have purchased them in bulk for desktops or servers.
    • No, the majority of people that buy drives do not care one iota about these numbers. Whether or not they buy pallets of drives, in China they ship them by the container full.

      I buy drives in bulk as well, I look at the specs and the price, they just do it more frequently. The failures are an issue with warranty and service, I donâ(TM)t care if I lose a drive, only people without redundancy and backups do.

  • Older, better? (Score:5, Interesting)

    by kackle ( 910159 ) on Tuesday February 13, 2024 @01:06PM (#64236930)
    My first PC's hard disk drives turn 30 years old this year and they still work perfectly. More recently, I noticed drives dying younger so I scheduled to replace them at the 10-year mark. But now I've had 2 laptop hard disk drives fail at work under at less than 9 years old. I replaced those with SSDs but read someone post about the same drive model failing him at 2 and 4 years. Sigh.

    I work in manufacturing and see that the first version(s) of something are often to make them work reliably, and then they get refactored over time to be cheaper. Buyer ( and planet [jump to 2:00] [cbsnews.com] ) beware.
    • Re:Older, better? (Score:5, Informative)

      by jacks smirking reven ( 909048 ) on Tuesday February 13, 2024 @01:30PM (#64236982)

      Ehh, this similar to when people post their grandparents refrigerator from the 1950's and remark about how it's still running. It's survivorship bias combined with plain technical advancement. That refrigerator may still be running but it wastes way more power, puts more heat into house, doesn't get as cold, has to run longer due to less insulation and is generally worse in every way that a modern refrigerator. Oh and when you adjust for inflation it cost around 3x the price.

      What we don't see is the number of similar hard drives that ended up in the bin. Growing up in the 90's hard drives were by far the most "fragile" aspect of computers, every odd little click or noise and we would get paranoid that it was about to head crash and plenty of people I knew experienced that very issue.

      30 years ago is 1994, the same year IBM launched the infamous DeskStar "Deathstar" line of drives.

      • That refrigerator may still be running but it wastes way more power,

        The point the OP was making had nothing to do with power consumption. It was specifically about reliability. Something made 30 years ago obviously won't have the same efficiency as something made today. No one is arguing that point. It's about how 30 year old machinery is still around and perfectly functional whereas something made five years ago might be on its last legs.

        It's like someone whining a new car has more features than one made

        • And my point is reliability is a wildly inconsistent metric over the years subject to wild survivorship bias. A sample size of 2-4 gives us no actual indication of whether those old drives were in fact more reliable and also pegs reliability as the sole metric we can judge modern technology by. That also has to be balanced with the improvements made to such things, maybe to build a HDD that last's for 30 years in 2024 means that drive costs 3x as much and nobody actually needs a HDD to last for 30 years b

          • by kackle ( 910159 )
            Geez, I'm not writing a research paper, 'just pointing out what is clearly, only, my observation. Pushing that observation further, I haven't had any drives fail until about a dozen years ago...

            That said, I AM in manufacturing, and that is what often happens with product lifecycles.

            And to poke the bear once more, yes, "inefficient" devices may be drawing more power than their modern equivalents, but how much power does it cost, how much pollution is created, to replace that item: from material acqu
            • My observation is the opposite, Growing up with computers I was taught the #1 reason to do backups was potential hard drive failure, maybe that reputation wasn't deserved but also how back do we judge this curve? Maybe core memory was more reliable than the first solid state based RAM, what does that mean exactly? Observation by itself doesn't mean anything and out of context can actually lead to wrong conclusions many times.

              I rarely see this considered as everyone is too busy ooh-ing and aah-ing over the shiny new widget. "Newer is better!"

              I mean this is just human nature and in a lot of cases, and while I like to shop f

              • by kackle ( 910159 )

                Just in car's alone, older ones have their charm and they are more repairable but no doubt newer cars are way nicer than old cars in almost every way and I could probably make a case that they are more reliable than older cars even with the fancy new tech. Doesn't hold true for everything but even in computers power supplies are so, so, so much better than years ago.

                It's funny you mention cars, because that is an example I sometimes use. I've wrenched on cars for decades and know them well, in and out. Yes, modern vehicles have their advantages, no doubt. However, I find their disadvantages are also ignored under the assumption that 'it's all better because it's new.'

                Because of their technical complication, they have a greater cost during creation and are more expensive to repair, sometimes outside of the capability of average automobile mechanics (creating owner

          • Cars from the 90s WERE built better, especially Japanese cars. This is not even in dispute. Work on some of them and then work on the cars made now, the difference is striking. Stuff that used to screw together now snaps together. Stuff that used to be secured with a bolt is now secured with a plastic fastener. The plastic fasteners themselves are more fragile than they were in the 90s, so you are more likely to have to replace them. Electrical connectors have been cost reduced so there are more failures. I

        • If refrigerators etc would be as reliable as OP makes it out to be, we would all still be running those fridges. The entire point is a sample size of 1. I have IDE drives that still work but the vast majority are garbage. And especially with drives, most people donâ(TM)t even know when they failed, I have MP3 that got partially corrupted from the time before I learned about ZFS, I regularly deal with the question of corruption and lack of backups on desktop, my previous laptop lasted for 7 years withou

        • With fridges these days, it's absolutely true that they aren't as durable, even accounting for survivor bias.

      • by tlhIngan ( 30335 )

        Ehh, this similar to when people post their grandparents refrigerator from the 1950's and remark about how it's still running. It's survivorship bias combined with plain technical advancement. That refrigerator may still be running but it wastes way more power, puts more heat into house, doesn't get as cold, has to run longer due to less insulation and is generally worse in every way that a modern refrigerator. Oh and when you adjust for inflation it cost around 3x the price.

        Nevermind the other question - w

      • Some of these stats are weak statistically due to low population counts, and more subtly due to low runtime. One of the Seagate 14's notably has an AFR that is way out of line from the population and the two that failed because of this.

        I gotta wonder at what point they find that QLC or PLC is a net win due to density.

    • by taustin ( 171655 )

      My first PC's hard disk drives turn 30 years old this year and they still work perfectly.

      On the other hand, you can exceed their storage capacity with a stack of Post-It Notes these days.

      • by kackle ( 910159 )
        On the other, other hand, the OS/data it's running/storing is tiny and is arguably as useful. :)
    • My first PC's hard disk drives turn 30 years old this year and they still work perfectly. More recently, I noticed drives dying younger so I scheduled to replace them at the 10-year mark.

      What was your sample size? Because the 3 you quoted is not statistically significant. Unless you work as a reliability engineer at a large data recovery house or a storage vendor you aren't noticing anything.

      Heck by my own account of a sample of about 12 drives one of which included an IBM Deathstar which failed and was replaced 5 times, and no other failures I can conclude that modern WD and modern Segate drives are 100% reliable and that an IBM drive with certainty will destroy your data. I also have an O

      • by dgatwood ( 11270 )

        The only thing you should be noticing is that your sample size is too small to draw any conclusion.

        What I've noticed is actually the opposite. Laptop drives built before... maybe 2005 or so would fail every two or three years, either because of a head crash, stiction, or bearing failure (fully functioning drive that sounds like a radial arm saw). Since then, I've replaced zero (though at some point around 2014, I switched to an SSD, so that only covers a narrow range of years.

        Desktop drives recently have also failed at a lower rate. Despite having about a dozen drives spinning now, my annual failure

        • by Mal-2 ( 675116 )

          I stopped buying Seagate, only to have significantly bad luck with Western Digital Caviar Green drives. I also went through four DeathStars back around 2000. The first one lasted about four months. The second one didn't last long enough for me to even reinstall everything, so I started stress testing before I bothered filling the third one. It was also dead in a few days. I drag it back to the store for an exchange, and they hand me another one from the same lot. So I go home and kill that one too with stre

    • I also had until recently 30 year old drives, they work perfectly. As far as I could tell, not a single sector failure. I think there was never a magnetic disk that failed on me.P erhaps they somehow knew: I have back ups?

  • My FAR smaller collection of hard drives has three that are well over 100,000 hours. All are under 1TB.

    Hitachi HDT721064SLA380 122286 hours
    WDC WD7501AALS-00E3A0 112822 hours
    WDC WD800AAJS-22L7A0 111600 hours

    I just took a Seagate out of service at about 60,000 hours due to a redirected block. It ran like that for nearly a year with no symptoms in the OS. Over the years, I have had more dead Seagates than any other brand.

    • by bobby ( 109046 )

      Not trying to top you, rather trying to say that some older stuff was made much more reliably. I try to tell people, the newer something is, it's likely been "cost-reduced", which means cheapened, then cheapened some more.

      In one server I have several Fujitsu MAJ3182MC running- 10K RPM, SCA, year 2000 date, running continuously, so maybe 200,000 hours. No errors at all. I have some spares on standby and have never needed one. Many other great hard drive brands, now defunct of course, because people always bu

      • So much misinformation in your post. Overall Seagate are no worse or no better than any other brand, look at the overall numbers from Backblaze and Google. You had a bad experience, which trumps any good experience, it is typical human behavior but is not borne out in the statistics,

        Low level formatting had nothing to do with rewriting magnetic domains or whatever, the heads on a drive are not powerful enough to either strip or replace the magnetic field (unless they crash and physically strip the magnetic

        • Although you are correct that low level formatting did not have anything to do with magnetic domains, it did not do as you describe.
          Low level formatting setup the sectors [with sector IDs, checksum, etc]. It was possible, with low level formatting, to take a drive designed for MFM and low-level format it RLL [and get a capacity increase to boot]. Back in those days, all of the understanding of the disc structure & bad sectors was held in the controller, not in the HDD itself.

          There was a tool, part of No

          • yes, I'm old. My original slashdot ID was less than 70000.
          • by guruevi ( 827432 )

            And that is what I said, it sets up the structure, it did not re-align/physically move the structure around. Yes, with the really old drives, but few people remember the MFM/RLL days, I barely do, I mostly had SCSI, I eschewed IDE for reliability issues but PATA eventually won on cost.

            And the tricks that Norton did were just that, tricks, it could refresh your data, perhaps, but if it was damaged, it was damaged and although many tools 'lied' about it, the reliability just wasn't there, it was mostly let's

            • the claim that "magnetic domains" were somehow involved was bullshit b/c it wasn't even possible. Not without reforming the platter or at least the metal oxide coating.
              But low level formatting DID move the structure [sectors and tracks are structure] around. Where the real problem with your statement lies in that you mentioned the partition tables. That's not low-level formatting any more than setting up the FATs is.
              The really old disks came with a list of bad sectors from the factory. and then you had to

  • I use Backblaze B2 with S3 compatibility as a secondary backup provider because of facts like this.

    I don't want my data backed up onto "shucked" consumer USB hard drives, so I use another provider for my primary backup and use Backblaze B2 S3 as secondary as a rule.

    I wouldn't recommend Backblaze as a primary online backup for any reason. Part of the reason is that we had to *beg* Backblaze to add S3 compatibility to their platform and it took them almost three years to agree [backblaze.com], and then posted a later blog a

"The vast majority of successful major crimes against property are perpetrated by individuals abusing positions of trust." -- Lawrence Dalzell

Working...