Forgot your password?
typodupeerror
Data Storage

Endurance Experiment Writes One Petabyte To Six Consumer SSDs 164

Posted by Unknown Lamer
from the ditch-your-hard-drive dept.
crookedvulture (1866146) writes "Last year, we kicked off an SSD endurance experiment to see how much data could be written to six consumer drives. One petabyte later, half of them are still going. Their performance hasn't really suffered, either. The casualties slowed down a little toward the very end, and they died in different ways. The Intel 335 Series and Kingston HyperX 3K provided plenty of warning of their imminent demise, though both still ended up completely unresponsive at the very end. The Samsung 840 Series, which uses more fragile TLC NAND, perished unexpectedly. It also suffered a rash of cell failures and multiple bouts of uncorrectable errors during its life. While the sample size is far too small to draw any definitive conclusions, all six SSDs exceeded their rated lifespans by hundreds of terabytes. The fact that all of them wrote over 700TB is a testament to the endurance of modern SSDs."
This discussion has been archived. No new comments can be posted.

Endurance Experiment Writes One Petabyte To Six Consumer SSDs

Comments Filter:
  • has anyone tried this with platter drives? would it simply take too long?

    it's hard for me to judge whether this is more or less data than a platter drive will typically write in its lifespan. I feel like it's probably a lot more than the average drive processing in its lifetime. and anyway, platter drive failure might be more a function of total time spent spinning or seeking or simply time spent existing for all I know.

    • I am sure someone has done it with platter drives, however it would take substantially longer to reach the same transfer quantities as the SDD's have much higher transfer rates than the spinny drives.
      • Re:context (Score:4, Interesting)

        by afidel (530433) on Monday June 16, 2014 @07:36PM (#47250253)

        Not that much higher for streaming reads and writes, the new Seagate 6TB can do 220MB/s @128KB [storagereview.com] streaming reads or writes. That works out to ~19TB/day so it would only take around 2 months to hit 1PB.

        • Re:context (Score:5, Informative)

          by timeOday (582209) on Monday June 16, 2014 @11:52PM (#47251819)
          But contiguous writes is the absolute (and unrealistic) best case in terms of MB transferred before failure for an HDD, because it minimizes the number of revolutions and seeks per megabyte written. For whatever it's worth, it used to be said that "enterprise grade" drives were designed to withstand constant seeking associated with accesses from multiple processes, instead of fewer seeks associated with sporadic, single-user access.

          If seeking does wear a drive, then using an SSD for files that generates lots of seeks will not only greatly speed up the computer, but also extend the life of HDDs relegated to storing big files.

          • But contiguous writes is the absolute (and unrealistic) best case in terms of MB transferred before failure for an HDD, because it minimizes the number of revolutions and seeks per megabyte written. For whatever it's worth, it used to be said that "enterprise grade" drives were designed to withstand constant seeking associated with accesses from multiple processes, instead of fewer seeks associated with sporadic, single-user access.

            If seeking does wear a drive, then using an SSD for files that generates lots of seeks will not only greatly speed up the computer, but also extend the life of HDDs relegated to storing big files.

            In regard to mixing SDDs and HDDs, there are some great caching programs that allow a single SSD to act as a front end to several HDDs. The driver looks at the traffic coming through, and if it is "mainly sequential writes", bypasses the SSD to write direct to the disk. For random stuff, across the x HDD drives, the SSD acts as a cache. The percentage of Sequential to Random is selectable.

            Very recently I purchased my first SDD at about $0.48 per gigabyte (128gig for $59.00) I expect that next yea

    • Why? The failure modes are completely different (and yes there are quite a few reports around on this subject..)

      SSDs have a write capacity limitation due to write/erase cycle limitations (they also have serious long term data retention issues).
      Mechanical drives tend to be more limited by seek actuations, head reloads, etc. The surfaces dont really have a problem write erase/write cycles.

      Nether are particularly good for long term storage at todays densities. Tape is MUCH better.

      • by pezpunk (205653)

        the problem with tape is by the time you can retrieve the data you're interested in, it no longer matters.

        • Re:context (Score:4, Informative)

          by LordLimecat (1103839) on Monday June 16, 2014 @09:19PM (#47251049)

          Tape actually has pretty high transfer rates. Its seek times are what sucks, but if you're doing a dump of tape you arent doing any seeking at all.

        • by dshk (838175)
          I regularly do restores from an LTO-3 drive, and the whole process takes no more than 5 minutes. If your data is useless after 5 minutes, then it is indeed unecessary to backup it, not to mention archiving it.
    • Re:context (Score:4, Informative)

      by ShanghaiBill (739463) on Monday June 16, 2014 @07:29PM (#47250205)

      has anyone tried this with platter drives?

      A few years ago, Google published a study [googleusercontent.com] of hard disk failures. Failures were not correlated with how much data was written or read. Failures were correlated with the amount of time the disk was spun up, so you should idle a drive not in active use. Failures were negatively correlated with temperature: drives kept cooler were MORE likely to fail.

      • by fnj (64210)

        Failures were correlated with the amount of time the disk was spun up, so you should idle a drive not in active use.

        That makes no logical sense unless the statement is missing a "not" somewhere, or unless you WANT failures.

        • Re:context (Score:4, Informative)

          by viperidaenz (2515578) on Monday June 16, 2014 @08:26PM (#47250641)

          While ShanghaiBill apparently struggles with the English language, the phase "you should idle a drive not in active use" means the drive will spin up fewer times. You should disable spin down and leave the drive idling, not on standby.
          You'll reduce the number of head load/unloads.
          You'll reduce peak current consumption of the spindle motor.
          The drive will stay at a more stable temperature.

        • by compro01 (777531)

          Failures were correlated with the amount of time the disk was spun up, so you should idle a drive not in active use.

          That makes no logical sense unless the statement is missing a "not" somewhere, or unless you WANT failures.

          You're reading the sentence wrong. You're reading it as "Times the disk was spun up".

          What they mean is the total amount of time the disk has spent spinning over its lifetime.

          • What they mean is the total amount of time the disk has spent spinning over its lifetime.

            Yes, this is correct. It is the total amount of time spent spinning that you want to minimize, not the number of "spin-up/spin-down" cycles. The longer the disk spins, the more wear on the bearings.

            • Re:context (Score:5, Interesting)

              by dgatwood (11270) on Monday June 16, 2014 @10:21PM (#47251445) Journal

              That's curious. Almost all of the drive failures I've seen can be attributed to head damage from repeated parking prior to spin-down, whereas all the drives that I've kept spinning continuously have kept working essentially forever. And drives left spun down too long had a tendency to refuse to spin up.

              I've had exactly one drive that had problems from spinning too much, and that was just an acoustic failure (I had the drive replaced because it was too darn noisy). With that said, that was an older, pre-fluid-bearing drive. I've never experienced even a partial bearing failure with newer drives.

              It seems odd that their conclusions recommended precisely the opposite of what I've seen work in practice. I realize that the plural of anecdote is not data, and that my sample size is much smaller than Google's sample size, so it is possible that the failures I've seen are a fluke, but the differences are so striking that it leads me to suspect other differences. For example, Google might be using enterprise-class drives that lack a park ramp....

              • by tlhIngan (30335)

                That's curious. Almost all of the drive failures I've seen can be attributed to head damage from repeated parking prior to spin-down, whereas all the drives that I've kept spinning continuously have kept working essentially forever. And drives left spun down too long had a tendency to refuse to spin up.

                The problem is that there are two ways for the drive to park the heads. (FYI - ALL spinning rust drives these days park the heads on power down). One of them is more violent than the other.

                There is the normal

                • by dgatwood (11270)

                  I've never had a drive that did emergency parking until my HD-based MacBook Pro. All my dead drives were too dumb to have the needed sensors, as were the machines that they were in.

                  With that said, I'm terrified at the aggressiveness with which that MacBook Pro parks its heads. I literally can't pick the thing up and place it gently on my bed without the heads doing an emergency park. I don't have a lot of faith in that drive lasting very long. Non-emergency parking is hard enough on the heads. Emergen

      • Stopping and starting a drive is also a moment where you can break/wear down a drive. This can be explained by the fact that heads rest on platters (unless in parked position) when the platters are not spinning at the right speed. Also, because a drive that is being spun down will cool down and warm up again when being spun up. These temperature fluctuations will be of influence on the drive reliability. The most plausible explanation I can come up with is that temperature shifts will make parts inside the

      • by ttsai (135075)

        A few years ago, Google published a study [googleusercontent.com] of hard disk failures. Failures were not correlated with how much data was written or read. Failures were correlated with the amount of time the disk was spun up, so you should idle a drive not in active use. Failures were negatively correlated with temperature: drives kept cooler were MORE likely to fail.

        Actually the paper says that the Google guys approximated power-on hours with a notion of age, which I assume was approximated by a knowledge of either the manufacture date of the delivery date. From the paper, annualized failure rate (AFR) is somewhat correlated with age, but not necessarily strongly enough to predict probability of failure. Even with their large drive population, the paper points out that the drive model mix is not consistent over time and therefore, not much can be made of the apparent

    • I suspect that direct comparisons are tricky: magnetic platter surfaces should, at least in theory, have virtually infinite read and erase capability; but every mechanical part dies a little when called on to move(and if the lubricants are a problem, when not called on to move for too long).

      With SSDs, we know that the NAND dies a bit every time it is erased and rewritten; sometimes after surprisingly few cycles with contemporary high density MLC NAND; but the supporting solid state stuff should last long
  • I have around 30 ranging from 40G to 512G, all of them are still intact including the original Intel 40G SSDs I bought way at the beginning of the SSD era. Nominal linux/bsd use cases, workstation-level paging, some modest-but-well-managed SSD-as-a-HDD-cache use cases. So far wearout rate is far lower than originally anticipated.

    I'm not surprised that some people complain about wear-out problems, it depends heavily on the environment and use cases and people who are heavy users who are not cognizant of ho

    • by BitZtream (692029)

      a bit unclear why, but any HDD I've ever put on the shelf (for 6-12 months) that I try to put back into a machine will typically spin-up, but then fail within a few months after that.

      The lubrication in the bearings of the platters and head arms gets thicker over time after being heated a few times. It needs to stay warm to keep a lower/workable viscosity. The drag becomes too great fairly rapidly after even a few months initial use when then stored on the shelf.

  • Considering 90% of my storage is write once, read many (email, mp3, dvds, programs, etc), this is good for me as long as the drive has a good, errr, brain fart, scheme so when I write a byte it chooses one I haven't written to in a while. My SSD should last forever, or until the electron holes break free of their silicon bonds.
  • Ability to write hundreds of terrabytes more is nice. But it's reading them back that I am really worried about. Great news for someone deploying a short term cache.

  • extremesystems test (Score:4, Informative)

    by 0111 1110 (518466) on Monday June 16, 2014 @09:11PM (#47250993)

    There was also a very interesting endurance test [xtremesystems.org] done on extremesystems.org. Very impressive stuff. I don't yet own an SSD, but I'll continue to consider buying one! Maybe next Black Friday. Just waiting for the right deal.

    • I bought two a few years back, and both are working like champs. The only problem I encountered is that my laptop now boots too fast. The keyboard becomes unresponsive for about 30 seconds (both Win7 and Linux), so I have to twiddle my thumbs at the login prompt. Before, this was hidden by the slow turning of the platters.
    • You really don't know what you're missing. For business laptops, we've made the switch to 100% SSDs for 2-3 years now (ever since they dropped down to $1.50-$1.75 per GB). Granted, these are all uses who can function with only a 128GB SSD. Which holds true for probably 90% of office workers who have access to a file server (instead of storing business critical data on their HD).

      Now, instead of waiting on their HD to seek around and find information (a boot process measured in minutes, program loading t
    • by Threni (635302)

      How does this compare to hard drives, though? That's the key metric. I don't mind my pc booting up in 30 rather than 10 seconds if I don't have to do disaster recovery and pay far more per gig.

  • IO pattern (Score:4, Insightful)

    by ThePhilips (752041) on Monday June 16, 2014 @09:26PM (#47251097) Homepage Journal

    That's a heck of a lot of data, and certainly more than most folks will write in the lifetimes of their drives.

    Continued write cycling [...]

    That's just ridiculous. Since when the reliability is measured in how many petabytes can be written?

    Spinning disks can be forced into inefficient patterns, speeding up the wear on mechanics.

    SSDs can be easily forced to do a whole erase/write cycle just by writing single bytes into the wrong sector.

    There is no need to waste bus bandwidth with a petabyte of data.

    The problem was never the amount of the information.

    The problem was always the IO pattern which might accelerate the wear of the the media.

    • by m.dillon (147925)

      Yes, but it's a well-known problem. Pretty much the only thing that will write inefficiently to a SSD (i.e. cause a huge amount of write amplification) is going to be a database whos records are updated (effectively) randomly. And that's pretty much it. Nearly all other access patterns through a modern filesystem will be relatively SSD-efficient. (keyword: modern filesystem).

      In the past various issues could cause excessive write amplification. For example, filesystems in partitions that weren't 4K-alig

    • I agree, measuring reliability like this is strange.

      Even more disturbing is the number of drives being tested. What is the statistical significance of their results?

  • Even Intel, behemoth of reliable server hardware, wasnt able to fix Sandforce problems.
    According to Intel representative Graceful Failover of SSD drive means you _kill_ the drive in software during a reboot :DDD and not switch it to read only mode (like you promise in the documentation).

    Kiss your perfectly readable data goodbye.

    • That was also my question when I RTFA. It says that the Intel drive entered some sort of "read-only" mode, and that at that point the drive was still OK. Then a new write cycle was forced (how?), and the drive committed seppuku and became unreadable.

      Which is it? Can I be confident that my SSD will fail to a gracious read-only mode? All my ~ is in RAID1 and backed up so I'm not worried, but it'd be nice to be able to just copy the / from a read-only SSD to a new one when the time comes.

  • Endurance Experiment Writes One Petabyte To Three Consumer SSDs
    "how much data could be written to six consumer drives. One petabyte later, half of them are still going."

Almost anything derogatory you could say about today's software design would be accurate. -- K.E. Iverson

Working...