Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Data Storage Hardware

Analyzing Long-Term SSD Failure Rates 149

wintertargeter writes "It looks like Tom's Hardware has posted the first long-term study of SSD failure rates. The chart on the last page is interesting — based on numbers, it seems SSDs aren't more reliable than hard drives. "
This discussion has been archived. No new comments can be posted.

Analyzing Long-Term SSD Failure Rates

Comments Filter:
  • Uh, yes they are (Score:4, Informative)

    by TheRaven64 ( 641858 ) on Friday July 29, 2011 @08:37AM (#36920496) Journal
    Did the poster even look at the chart he linked to? Those big lines that shoot up to the top after 1-3 years? They're the failure rates for hard disks. The ones near the bottom? They're the failure rates for SSDs. Now, some of the SSD figures are projected and look quite optimistic, but the number of hard disks failing after 3 years looks high than the number of SSDs failing after three years by all of the studies. For most workloads, the SSDs fail less often, and the SSD failures only exceed HD failures very early on in their lifetimes.
    • While the chart isn't a very good chart, I also can't figure out how anyone could write

      based on numbers, it seems SSDs aren't more reliable than hard drives

      about it. Even if the projections are thrown out there are wild differences between all the SSD plots and all the HDD plots.

      • Reply to un-do accidental moderation. Apologies.

      • by geogob ( 569250 )

        It would not be a surprise if the long term failure rate of SSD is drastically different than the one for HDD. Although the extrapolation may turn out to be wrong (it's an extrapolation after all), I do not believe it is that far fetched. From another point of view, fitting HDD failure rate curves to SSD would be plain wrong.

    • by msauve ( 701917 )
      Well, it depends on the application. Assuming the chart is accurate, disks are more reliable for the first year. So, if you have a short term application/need, or replace your hardware every year, then disks are more reliable.
    • by Geoffrey.landis ( 926948 ) on Friday July 29, 2011 @08:58AM (#36920764) Homepage

      Did the poster even look at the chart he linked to?

      Did you? Apparently not.

      Ignore the dashed lines-- those curves are not data, they are "projection." The chart has no data on SSD failures late in the lifetime. So, when you say "...SSD failures only exceed HD failures very early on in their lifetimes," that is equivalent to saying "SSD failures only exceed HD failures in the region of the graph for which there is data."

      • by geekoid ( 135745 )

        But that's not true. Every SSD on the chart has a lower failure rate in the small section proceeding the 6 - 12 month mark.

        • But that's not true. Every SSD on the chart has a lower failure rate in the small section proceeding the 6 - 12 month mark.

          ??

          Apparently we are looking at different graphs. The graph I'm looking at is the one linked in the summary above, here: http://media.bestofmicro.com/4/A/302122/original/ssdfailurerates_1024.png [bestofmicro.com]
          In the "small section proceeding the 6 - 12 month mark" that you refer to, the highest failure rate is the light green curve, labelled "SLC SSD (Ku 2011)", while the lowest failure rate is the red curve, labeled "HDD (Schroeder 2007)".

          The red HDD curve remains the lowest out to 2.5 years, which is farther out than an

          • The purple Google HDD curve is by far the worst, but theirs looks like after 3 years 20% of the drives will have failed. That by far is the worst graph on there, and shoots above the SSDs quite quickly.

    • by Baloroth ( 2370816 ) on Friday July 29, 2011 @09:08AM (#36920872)
      Look closer. At any points where they have actual data, the failure rate for SSDs is higher than that of HDD, except for the Google study, which I bet puts the drives under massive load or something else funky (given its massive difference from all the other HDD charts.) Only in the projections for the SSDs do the HDDs begin to curve upwards, throwing off the graph. And from what I know of flash memory, especially MLC (which most SSDs are), I'd bet that SSDs will curve upwards too. Sure, wear leveling will help, but if a cell fails with data in it, which can still happen, then that data is lost. So yeah, for any section where they have actual data, SSDs do have a higher failure rate that hard drives. Incidentally, that's a really terrible and deceptive chart.
      • Re: (Score:2, Insightful)

        by Anonymous Coward

        You do know that HDDs also require wear leveling right? (Well, not really, but defective blocks were pretty much part of life when HDDs were in the 10-100MB size.)
        So yes, both SSDs and HDDs are likeyly to wear out after time. What wear leveling does is that it makes sure that the entire disk is pretty much worn out when you start encountering bad blocks.
        With SSDs there is however one slight improvement. Since flash memory have been used for so long without wear leveling and in applications were it's damn im

      • Totally agree - it's just a horrible chart. The notion of given arithmetic projections on the small amount of data available for SSD's is ridiculous, especially given that failure rates for all the drives for which there are data over an extended period are all geometric.

        As Niels Bohr might have remarked, any conclusions drawn from data on that chart "would not even be wrong."

    • Did the poster even look at the chart he linked to? Those big lines that shoot up to the top after 1-3 years? They're the failure rates for hard disks. The ones near the bottom? They're the failure rates for SSDs.

      The poster probably saw the chart, as they seem to have actually read the article in addition to merely glancing at a picture on the last page. Right below that graph:

      But under the best of conditions, hard drives typically top out at 3% by the fifth year. Suffice it to say, the researchers at CMR

      • by Hyppy ( 74366 )

        The poster probably saw the chart, as they seem to have actually read the article in addition to merely glancing at a picture on the last page. Right below that graph:

        But under the best of conditions, hard drives typically top out at 3% by the fifth year. Suffice it to say, the researchers at CMRR are adamant that today's SSDs aren't an order of magnitude more reliable than hard drives.

        So you're quoting that SSDs are not 10x more reliable than HDDs. That doesn't exactly prove a point that HDDs are more reliable.

        • They do not need to be.

          What I mean is:

          SSDs are way more expensive per gigabyte than hard drives. However, they are faster, use less power* and are more reliable, or it was said. So, if you do not care about speed SSDs are probably not worth the high prices, since they are not more reliable than HDDs.

          * seems to me that the power consumption is not much less than that of the hard drives too.

        • by oGMo ( 379 )

          So you're quoting that SSDs are not 10x more reliable than HDDs. That doesn't exactly prove a point that HDDs are more reliable.

          The original poster said "it seems SSDs aren't more reliable than hard drives." Do not create a straw man. The article indicates that while marketing and simpletons may point out select statistics as "more reliable," there's a lot more to the story, and it's difficult to impossible to get meaningful data at this point. That is, based on their analysis, SSDs are not provably mor

    • Did the poster even look at the chart he linked to? Those big lines that shoot up to the top after 1-3 years? They're the failure rates for hard disks. The ones near the bottom? They're the failure rates for SSDs. Now, some of the SSD figures are projected and look quite optimistic, but the number of hard disks failing after 3 years looks high than the number of SSDs failing after three years by all of the studies. For most workloads, the SSDs fail less often, and the SSD failures only exceed HD failures very early on in their lifetimes.

      Not only does the data point to better reliability for SSD, look at the application!

      All of the HDD data is from datacenters - rack mounted, cooled, well cared for drives. Now imagine what happens to a drive in a laptop. I think it would be interesting to see that comparison.

    • What I dont get is that companies like Dell have been shipping SSD's for much more than 5 years now. Surely Dell has some good statistics about failure rates, since their customers want refunds and shit when things die quickly. Is it that Dell wont release the data? Has anyone even asked?

      I understand that the latest crop of SSD's from companies like OCZ have been a real nightmare. I suspect the OCZ issue has to do with powering down the device, with the capacitor responsible for ensuring this happens corr
      • by dgatwood ( 11270 )

        I understand that the latest crop of SSD's from companies like OCZ have been a real nightmare. I suspect the OCZ issue has to do with powering down the device, with the capacitor responsible for ensuring this happens correctly isnt supply enough power for long enough to let all the buffers write out correctly.. most of the failure posts you see on newegg begin "I put the machine to sleep...." .. in other words, several gigs were written out right before the device lost its primary supply of power. So it cou

        • This means that multiple metadata writes to the same block return much more quickly, but it also means that the data isn't really committed to stable storage when the OS thinks that it is. If the machine shuts down and that data still hasn't been flushed, it goes away.

          The issue isnt File System corruption.. when the issue happens, the machine simply can no longer detect the drives no matter what you do. Its not simply data loss.. complete loss of any access to the device.

          This means either an actual hardware failure, or that during power-on the firmware is failing to initialize. My theory is this second alternative, that its Block System (not the same as the File System! The Block System maps logical sectors to subsets of physical blocks in an arbitrary way) has become

          • by dgatwood ( 11270 )

            The issue isnt File System corruption.. when the issue happens, the machine simply can no longer detect the drives no matter what you do. Its not simply data loss.. complete loss of any access to the device.

            You're talking about a completely different failure than the "time warp". The "time warp" bug that I was referring to causes all the data on your disk to suddenly revert to a previous state.

            Now it is possible that the two failures have the same root cause—I wouldn't begin to speculate on that

  • The chart linked is not terribly useful, since the legend doesn't really explain what the curves are (three completely different curves with the same label, HDD Schroeder 2007).

  • Hasn't this always been the assumption? I've always been told by everyone in every discussion about SSD vs HDD that SSD has a lesser lifetime.
    • Hasn't this always been the assumption? I've always been told by everyone in every discussion about SSD vs HDD that SSD has a lesser lifetime.

      Doesn't make them less reliable, necessarily. An SSD may have a generally shorter lifespan than a HDD... but if the failure rates are known to be lower, then they are more reliable over that time period. For me, it's a moot point, since the rate at which I upgrade is quicker than either, so I go for both - SSD in my laptop for speed, small USB hard drive that I throw in the bag to store photos, media, etc, and two big RAID arrays at home for long term storage and short term backups. I'm more concerned about

      • by HiThere ( 15173 )

        What do you call long term? I've had CDs (burned, not pressed) fail after sitting on the shelf for a decade. DVDs are reputed to be less stable, and Blu-Ray to be even less so.

        For medium term storage I consider HDs to be optimal. DVDs are probably OK for 5 years. Can't even speculate on the lifetime of a burned BluRay...but I'd guess 2.5 years. (Guess is the word. Denser storage is usually more fragile is my only reasoning.) *IF* you have a thermostatically controlled vault, and *IF* you have either

        • What do you call long term?

          In the case of photos, long term = forever. Right now, I use a combination of online backups & multiple hard drives. Online because if my house burns down, I doubt I'll feel good about having had multiple copies go down in flames. Multiple drives, because I have trust issues with online cloud backups as a LONG TERM service.

  • Huh? (Score:5, Insightful)

    by adamjcoon ( 1583361 ) on Friday July 29, 2011 @08:41AM (#36920548)
    I didn't read TFA but the chart doesn't tell me that "SSDs aren't more reliable than hard drives".. the SSDs were generally 6% or under (assuming the linear progression) whereas regular HDD approached 14%+ after five years. And "Long-term" in the title? The SSD data in the chart only goes for 1 year. Not exactly long term when the chart goes from 1-5 years of use. The actual data for the SSDs is only 20% of the time span.
  • Whoever wrote that article might know a lot about drives. But they don't know a lot about how to write an interesting and readable article.
    • by TheLink ( 130905 )

      IMO whoever wrote that article is a shill, full of shit or an idiot. The article is not analysis, it's far closer to "anal-related" stuff...

      Example: http://www.tomshardware.com/reviews/ssd-reliability-failure-rate,2923-3.html [tomshardware.com]

      Ultimately, the French-English language barrier was responsible for how hyped-up this information became. Sites like Mac Observer and ZDNet incorrectly reported these figures as "failure rates" based on a Google Translation.

      A drive failure implies the device is no longer functioning. However, returns can occur for a multitude of reasons. This presents a challenge because we donâ(TM)t have any additional information on the returned drivesâ"were they dead-on-arrival, did they stop working over time, or was there simply an incompatibility that prevented the customer from using the SSD

      But from the french retailer's stats:
      Released in April 2011
      http://news.softpedia.com/newsImage/French-Website-Publishes-HDD-SSD-and-Motherboard-RMA-Statistics-4.png/ [softpedia.com]
      Released in December 2010
      http://www.behardware.com/articles/810-6/components-returns-rates.html [behardware.com]

      You will see that Intel ha

  • I don't think it's really fair to say at this stage that SSDs aren't more reliable than hard drives.

    For one, SSDs are still rather new. Yes, they've been around for a few years but compared to hard drives they are still at the beginning of their development cycle, and it shows: firmware issues and recalls, as stated in the article, may be a heavy contributing factor to the SSD failure rates. We can expect this to drop as manufacturers continually revise their firmware and manufacturing techniques for the be

    • by arth1 ( 260657 )

      For one, SSDs are still rather new. Yes, they've been around for a few years but compared to hard drives they are still at the beginning of their development cycle, and it shows: firmware issues and recalls, as stated in the article, may be a heavy contributing factor to the SSD failure rates. We can expect this to drop as manufacturers continually revise their firmware and manufacturing techniques for the better.

      I wouldn't bet on it.
      Price is also a major factor - probably the major factor.
      SLCs are almost impossible to find because of just that; even though they have a magnitude higher write count and much faster worst-case write times, they don't sell because of price.
      If the customer has a choice between paying more to get a more reliable drive, or pay less to get a less reliable drive, guess what he will choose? That's also why computers who used to last ten years now last two.

      Quality just isn't a major concern i

      • by h4rr4r ( 612664 )

        Because we replace them every couple years.
        Besides my homebuilt rigs last fine. Sure my power supply is a lot nicer than what dell provides, but that is what cheap gets you. I save in the long run though buy just replacing parts of the machine to upgrade.

      • My Intel x-25 SLC (32 GB I think) is still going strong, it is about 4 years old, and I don't even have trim support. But it is all in how you use it. I load my video games on it so that I get fast load times. The data doesn't change much, and I get that speed where I need it the most.

  • Worst. Ever. (Score:5, Insightful)

    by DarthVain ( 724186 ) on Friday July 29, 2011 @09:12AM (#36920924)

    Let me summarize:

    A) Chart is worthless. I have never see a more ambiguous meaningless chart in my life. They might as well not bother to label things.
    B) Lets do a reliability study on SSD's that they don't have any long term data on past 2 years, yet compare it to HDD that typically at least have a 3 year warranty. By that I only mean, I'll go out on a limb and guess that the average failure rate of HDD is > 3 years, if only for economic self preservation.
    C) Results in either case depend highly on specific device model and configuration.

    • But, you can make projections from limited data. Disclaimer, all I looked at was the chart and I think you can assume linear failure rates for SSDs and exponential for HDDs (probably because of more components and different failure points). The chart is pretty clear if I'm interpreting it correctly.

      Just like sampling a population in statistics, you're working with limited data but you can hypothesize based on a small sample. What you can't tell is if there's some failure bomb (unlikely) outside the data

      • http://xkcd.com/605/ [xkcd.com]

        I agree. However if it only makes sense if you assume failure in 2 years or less.

        If you are comparing them to the HDD on the graph, you can see that HDD have a failure rate of about 1-2% in the first two years (dependent on whatever the fsck the different colors mean), and beyond that can go as high as 20% after 5 years.

        So what is it you are testing for? What are you doing comparisons against? Not to mention why they assume one is linear as opposed to all the other long term data being e

    • We don't buy SSDs because they are more reliable (they don't seem to be in our large RAID arrays), but because they are faster than HDDs.

  • Based on numbers, the study shows SSDs to be more reliable than HDDs. The best data I have seen in that article is the following:

    SSDs: 1.28--2.19% over 2 years

    HDDs: >=5% over 2 years

    The HDD data comes from: http://media.bestofmicro.com/2/N/289103/original/google_afrtemputilization_475.png [bestofmicro.com] The SSD data comes from the table on Page #6.

    I don't think any of this data is particularly surprising, HDDs are mechanical so the curves for failure would not be linear. The most interesting part of the ar

    • Re: (Score:3, Interesting)

      by rayd75 ( 258138 )

      The most interesting part of the article for consideration with SSDs is that SMART is going to be near useless for them. Since most failures are random occurrences in electronics which SMART isn't good at detecting, we may need better technology for detecting SSD failures.

      Have you ever seen SMART perform in a useful way on a mechanical disk? At work and at home, I've gone through a crap-ton of hard disks in the last decade or so that SMART's been prevalent and never have I seen SMART flag a drive as problematic before I already knew I had a serious problem. More often than not, I've had systems slow to a crawl due to massive numbers of read errors and sector reallocations while the drive firmware actively lied to me about the drive's condition. Only looking at the raw SMART

      • The raw SMART stats are part of SMART. That's all I look at. With some brands of drive, certain counters increase wildly because those stats aren't supported, and you're seeing random data. I always go by Reallocated Sector Count, Pending Reallocations, and uncorrectable sectors - among others if they are available.

        • by MPAB ( 1074440 )

          How many uncorrectable/reallocated sectors do you consider hazardous?

          My main HD is a 1TB Seagate 7200. I first noticed a SMART alert in Ubuntu with a number of reallocated sectors of around 45. A year or so afterwards, it reads 52.

          Do you think I should worry?

          • The drive has hundreds of spares. I start worrying when it's greater than 0, but it's really more about how quickly it's happening (if a large area of the drive has failed and it's constantly finding more bad spots. At your stage, I would run the manufacturers scan that would force it to check every last sector and see where you're at. Then, back up regularly and keep using it.

      • by 0123456 ( 636235 )

        Have you ever seen SMART perform in a useful way on a mechanical disk?

        Yes. When my laptop drive failed the problem was very obvious from a perpetually increasing reallocated sector count; that gave me long enough to copy off my data files to a new disk and replace the old one.

        I had a similar experience with the only other hard drive I've had fail; they both went gracefully with plenty of warning and plenty of time to get the data off.

        • by guruevi ( 827432 )

          Usually those problems result in significant performance problems and THAT is why people notice and investigate. In most systems there is no reporting back to the end-user or the administrator and the SMART status has to be read with a special tool anyway.

    • by afidel ( 530433 )
      Over the last 5 years we've seen an AFR of ~1.5% across our datacenter with interesting nonlinear points for infant mortality and sibling death.
  • it is also a lot easier to retrieve data from disc then SSD that most of the time go without warning
  • According a a perfectly baseless linear interpolation on several charts, SSD have similar failure rates as HDDs. Just great... Call me back when we have 5 years of solid data, not just conjectures et inference.
  • I want to know if SSDs are more reliable than HDDs in an environment full of cat hair. I've never had a SCSI HDD outlast its warranty.

  • That's my take. And unlike a hard drive, firmware is something which can be continuously improved. SSD manufacturers are starting to understand and deal with the failure modes.

    One thing they don't mention is off-line storage. If I take a hard drive out of service and store it on a shelf for a year, it's virtually guaranteed to fail when I power it up. That is, every single HDD I've taken off the shelf will tend to work for a short while, long enough for me to get the data off of it usually, but every si

  • Any thread on SSD failures should include a link to Jeff Atwood's blog entry on the topic:

    I feel ethically and morally obligated to let you in on a dirty little secret I've discovered in the last two years of full time SSD ownership. Solid state hard drives fail. A lot. And not just any fail. I'm talking about catastrophic, oh-my-God-what-just-happened-to-all-my-data instant gigafail. It's not pretty.

    Full post here: http://www.codinghorror.com/blog/2011/05/the-hot-crazy-solid-state-drive-scale.html [codinghorror.com]

  • I think a good question is how do HD's fail vs how SDD's fail.
    There are two distinct ways that an HD can fail, either the circuitry on the PC board goes bad or what's inside the sealed chamber goes bad.
    In the former case the data SHOULD be recoverable. In the later case there are three possible failure points, the platter motor, the head stepper motor, or the heads themselves. In the first two cases the drive could be repaired and the data salvaged, but it will take more effort and money to do so. If the

  • What I care about is MTTDL (Mean Time to Data Loss) of a complete system. Hard drives are unreliable as is any component in your computer. That's why you should make at the very least every individual component your data path follows double or triple redundant. This is easily accomplished these days, ZFS, RAID6 etc. Make sure you don't just rely on the mechanical parts of your system to keep your data safe.

    What else you want to know is undetected data corruption. Hard disks are very bad at keeping data, SSD

    • by jon3k ( 691256 )

      Besides that, SSD's give way more IOPS than any hard drive available (even the 15k RPM ones)

      Several orders of magnitudes higher. There are SSD's producing as many as 50,000 random 4k IOPS vs about 250 from a 15k RPM fiber channel hard disk.

  • You might find parts in an SDD drive which you can use for pinning stuff to your fridge. But anyone who ever put a hard drive magnet on his fridge knows that nothing can hold your data as reliably as a hard drive magnet.

    By definition anything that contains powerful magnets is cool.
    (Oh, and a fast spinning BLDC motor.)

    Shiny platters!

  • The only data they could possibly have that is that old is data on some of the earliest mass produced consumer SSD drives. These were first generation products and logic would dictate that the drives being made today would be far more reliable. I think it's too early to try and draw any conclusions, everyone knows there were lots of problems with the first generation of drives, like most first gen products (eg - lack of TRIM, stuttering with the early Indilinx controllers, etc).

What sin has not been committed in the name of efficiency?

Working...