Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Data Storage

Endurance Experiment Kills Six SSDs Over 18 Months, 2.4 Petabytes 204

crookedvulture writes Slashdot has previously covered The Tech Report's SSD Endurance Experiment, and the final chapter in that series has now been published. The site spent the last 18 months writing data to six consumer-grade SSDs to see how much it would take to burn their flash. All the drives absorbed hundreds of terabytes without issue, far exceeding the needs of typical PC users. The first one failed after 700TB, while the last survived an astounding 2.4 petabytes. Performance was reasonably consistent throughout the experiment, but failure behavior wasn't. Four of the six provided warning messages before their eventual deaths, but two expired unexpectedly. A couple also suffered uncorrectable errors that could compromise data integrity. They all ended up in a bricked, lifeless state. While the sample size isn't large enough to draw definitive conclusions about specific makes or models, the results suggest the NAND in modern SSDs has more than enough endurance for consumers. They also demonstrate that very ordinary drives can be capable of writing mind-boggling amounts of data.
This discussion has been archived. No new comments can be posted.

Endurance Experiment Kills Six SSDs Over 18 Months, 2.4 Petabytes

Comments Filter:
  • No warning ? (Score:5, Interesting)

    by itzly ( 3699663 ) on Thursday March 12, 2015 @01:19PM (#49243311)

    The fact that 2 of them died without warning is disappointing. I would rather have a shorter life time, but a clear indication that the drive is going to die.

    • Re:No warning ? (Score:5, Insightful)

      by harrkev ( 623093 ) <kevin.harrelson@gmai[ ]om ['l.c' in gap]> on Thursday March 12, 2015 @01:24PM (#49243349) Homepage

      What pisses me off is that the Intel drive suicided. OK, I can understand that they track writes and shut it down once confidence goes down. I get that. However, the drive should be read-only after that!

      If I had a drive that still held my perfect, pristine data, but I could not actually get to it, I would be pissed. What is wrong with going into a read-only mode?

      • by tibit ( 1762298 )

        I presume that the drive treats its firmware as just a special range of blocks/sectors, subject to the same management as everything else. Eventually, you power cycle it, and the bootloader can't find any viable firmware blocks. It then appears bricked. That's the only explanation I see.

        • by sjames ( 1099 )

          But those sectors shouldn't have seen a write since the factory, why should they fail?

          • Sectors are logical, not physical.
            • Sectors are logical, not physical.

              Which means, if GP were correct that the firmware is also store in main NAND... it's terribly bad design.

              No doubt about it, firmware should be on its own physical chip.

            • by sjames ( 1099 )

              NAND also comes in sectors that are physical. Several logical drive sectors are mapped onto a physical NAND sector.

              The physical NAND sectors the firmware is stored on shouldn't be altered once the drive leaves the factory.

              Consider, does your BIOS in flash wear out?

              • Several logical drive sectors are mapped onto a physical NAND sector.

                umm... no...

                First, I'm going to assume you are just terminology-ignorant, because NAND comes in BLOCKS composed of PAGES.. so I am going to translate your poor use of "sector" with regards to flash as what you probably really meant.... "page" (if you meant something else, you are even more ignorant... so take this kindness)

                A logical drive sector can be mapped to literally any physical page on the flash, and which page a specific logical sector maps to changes over time.

                Now why the hell did you open

                • by sjames ( 1099 )

                  You are apparently only aware of one convention. Other documentation speaks of sectors and considers blocks a hard drive thing.

                  Sounds more like you're butthurt that I called you on saying something silly.

                  The mapping only changes when a logical block is written OR a sufficient number of logical blocks are invalidated in the sector.

                  Nobody who wasn't recently kicked in the head by a horse if going to put the drive's firmware somewhere where it will be moved around and re-written. Beyond the many other problems

      • IIRC that's what it was supposed to do, but it must have had a firmware bug and didn't quite manage it.

      • I can only imagine it's so that Intel can sell you a special "SSD Data Recovery Pro SUPER-DUPER XD3 Turbo" enclosure that will flip some bits and allow you to obtain read access for the low low price of $199.00
        • by mlts ( 1038732 )

          Long term, what really is needed are more sophisticated backup programs than the stuff we have now since once SSD fails, it fails for good. Backup programs not just for recovering files, but can handle bare metal restores, and are initated by the backup device (so malware on the backed up client can't trash the backup data.)

          For desktops, this isn't too bad, because one can buy a NAS, or an external drive at minimum. For laptops, it becomes harder, especially if one factors in robust security measures whil

    • Re: (Score:2, Informative)

      by Anonymous Coward

      The fact that 2 of them died without warning is disappointing. I would rather have a shorter life time, but a clear indication that the drive is going to die.

      The fact that it is a drive means it is going to fail.

      You have been warned.

    • The fact that 2 of them died without warning is disappointing. I would rather have a shorter life time, but a clear indication that the drive is going to die.

      What I found disturbing is that TFA claims that the drives intentionally bricked themselves, rather than going into read-only mode. Why would they be designed to do that? I always assumed that even if a SSD died, I would still be able to recover the data. Apparently that isn't true.

    • That test found that Samsung SSDs are the best, though we found that they are not good.

      Had a couple of 840 Pros (512GB) in a RAID1 array. After some time the array would slow down. One drive became slower than the other (fio reports 7k random write IOPS on one drive and the number is constant, but the other drive gets 6k IOPS and the numbers sometimes drops to 400).
      OK, what about 850 Pro (1TB). Well, after some time the array became very slow due to one drive becoming slow.

      No more Samsung SSDs for us. Obvio

    • by Bengie ( 1121981 )
      They did fail with a warning if you assume the wear level count reached zero 2 petabytes ago.
    • by sjames ( 1099 )

      My concern is that they brick. I understand that a newly written sector may fail miserably and that if it cannot find a functional empty sector it may lose that sector entirely, but why can't it allow the existing successfully written sectors to be read off in a read only mode?

    • Re:No warning ? (Score:4, Insightful)

      by Ravaldy ( 2621787 ) on Thursday March 12, 2015 @03:20PM (#49244405)

      I think better backup strategies apply here. If someone steals your computer you got just as much warning as the SSD drive. Just saying.

  • by Immerman ( 2627577 ) on Thursday March 12, 2015 @01:23PM (#49243343)

    Talk about your planned obsolescence - not a single sector reallocation registered, but the firmware counter says it's write-tolerance is reached so it kills itself. I suppose it's nice that it switches to read-only mode when it dies, except for the fact that it bricks itself entirely after a power cycle. I mean come on - if it's my OS and/or paging drive then switching to read-only mode is going to kill the OS almost immediately, and there goes my one chance at data recovery. Why not just leave it in permanent read-only mode instead? Sure it's useless for most applications, but at least I can recover my data at my leisure.

  • by GGardner ( 97375 ) on Thursday March 12, 2015 @01:25PM (#49243355)

    The drive's media wear indicator ran out shortly after 700TB, signaling that the NAND's write tolerance had been exceeded. Intel doesn't have confidence in the drive at that point, so the 335 Series is designed to shift into read-only mode and then to brick itself when the power is cycled. Despite suffering just one reallocated sector, our sample dutifully followed the script. Data was accessible until a reboot prompted the drive to swallow its virtual cyanide pill.

    Who thought this was a good idea? If the drive thinks future writes are unstable, good for it to go into read only mode. But to then commit suicide on the next reboot? What if I want to take one final backup, and I lose power?

    • Some additional info from an earlier article [techreport.com]:

      According to Intel, this end-of-life behavior generally matches what's supposed to happen. The write errors suggest the 335 Series had entered read-only mode. When the power is cycled in this state, a sort of self-destruct mechanism is triggered, rendering the drive unresponsive. Intel really doesn't want its client SSDs to be used after the flash has exceeded its lifetime spec. The firm's enterprise drives are designed to remain in logical disable mode after the MWI bottoms out, regardless of whether the power is cycled. Those server-focused SSDs will still brick themselves if data integrity can't be verified, though.

      SMART functionality is supposed to persist in logical disable mode, so it's unclear what happened to our test subject there. Intel says attempting writes in the read-only state could cause problems, so the fact that Anvil kept trying to push data onto the drive may have been a factor.

      All things considered, the 335 Series died in a reasonably graceful, predictable manner. SMART warnings popped up long before write errors occurred, providing plenty of time—and additional write headroom—for users to prepare.

      So, it sounds like this is the intended behavior for *enterprise* drives. It may not be the same for *consumer* drives, but that's a bit unclear.

      While it may make you feel better if consumer SSD drives would go into a permanent read-only mode, it seems extremely unlikely that a typical consumer would ever actually reach this point in an SSD's life at all. So, I'm not really losing sleep that my own Intel SSD drives are going to brick themselves, when at a typic

    • This reminds me of the systems they tried to impose in the early 70s' US cars: if the seat belts weren't latched, the cars would not start. No, not just an annoying bong for a brief time; the freaking engine would not even turn over. And this being completely analog, system failures happened quite a bit. But fear not! If there was a failure, there was an under hood button you could push to bypass the system. Once. A one-time use, then you had to tow the car to a dealership for a new box. As most every
  • by mcrbids ( 148650 ) on Thursday March 12, 2015 @01:25PM (#49243367) Journal

    As SSD cells wear, the problem is that they hold charge for less time. Starting new, the time that the charge will be held would be years, but as the SSD wears, the endurance of the held charge declines.

    Consequently, continuous write tests will continue to report "all good" with a drive that is useless in practice, because while the continuous write will re-write a particular cell once every few hours, it might only hold a charge for a few days - meaning if you turned it off for even a day or so, you'd suffer serious data loss.

    SSDs are amazing but you definitely can't carry conventional wisdom from HDDs over.

  • the results suggest the NAND in modern SSDs has more than enough endurance for consumers

    "Challenge accepted." - some guy trying to invent octo-level-cell flash

  • Wanna know how to kill a spinning disk? Put it in a DVR. My DVR (with the "pause and rewind live TV" ability), would re-write 100% of the time. It died in a few months. Replaced it with a larger one and turned off the live-TV buffer, and it's lasted years. But it's all anecdotal, so I expect tests like this to give us some level of comparison.
    • Many DVR last for years. Mine is stil going strong after 3 years. It's a central DVR, which means anytime one TV's receiver is turned on, the DVR starts recording for rewind purposes.
    • Figuring out the resonant frequency of the platters and moving the heads back and forth at that frequency is much more fun.

      I know it worked back in 5.25 days. The resonance of the platters should be higher with smaller drivers, but the heads should move fast enough.

    • by swb ( 14022 )

      I've owned Tivos since 2002 and I've only had one blow a drive, a series 3 I bought from WeakKnees with an upgraded disk in it. The drive didn't fail spectacularly or even completely, we just had a ton of playback problems and recordings that grew increasingly unreliable. That Tivo was bought in 2007 and the drive was replaced last fall.

      The original Tivo I bought in 2002 finally got tossed without a drive failure when Comcast gave up on analog SD channels a couple of years ago. I think this was after bro

    • My Tivo Series 2 was retired after 9.5 YEARS of continuous operation and use. It still works - only my cable went digital and the Tivo only has analog tuner, so it is now useless. But still functional with the original HD.
    • by sjames ( 1099 )

      That's just crappy design. It should be able to hold an hour of programming in RAM and so never touch the disk unless you actually use the rewind feature.

  • I'd like a mix of drives on my next box. A moderate "traditional" spinning oxide 1TB drive with a lot of cache for the primary boot, swap, and home directories, and an SSD mounted as my project workspace under my home directory. The work directory is where I do 99% of my writes, producing roughly 3GB for one particular project in about an hour's time.

    My existing drive on my main box has survived a god-awful number of writes, despite it's 1TB size. My work is emphatically I/O bound for the past month o

    • by swb ( 14022 )

      I think what you'd really want is something where the SSD takes all writes, mirrors them to HDDs and caches all reads to SSD, but can read AND write to the HDDs if there is a loss of SSDs.

      Bonus points for actual SAN-like behavior, where the total system capacity is actually measured by the HDD capacity and the system is capable of sane behavior, like redirecting writes to HDD if the SSD write cache overflows and preserving some portion of high-count read cache blocks so that unusually large reads don't dest

      • by msobkow ( 48369 )

        Yeah, but all the critical project data gets imaged to GitHub and to another machine, so there is no need to back it up from the SSD to a platter. When it only takes 10 minutes to restore data from offsite, there isn't much point backing it up to multiple devices locally (unless they're on different machines, of course.)

        • by swb ( 14022 )

          I still think there's so much performance advantage to be gained from the OS and apps on the SSD that the only real purpose of spinning rust is capacity and whatever reliability it provides over SSDs. The torture test seems to indicate that the reliability factor isn't that much to worry about.

  • Why Intel, why? We can all discuss whether the device should prematurely fail by some arbitrary software limit, but why BRICK it, as it can cause complete data loss!?

    Instead, just set the drive to always boot in read-only mode, with secure erase being the only other allowed command. Then someone can recover their data and wipe the drive for good.

    Intel doesn't have confidence in the drive at that point, so the 335 Series is designed to shift into read-only mode and then to brick itself when the power
  • This experiment only documents the survivability of the NAND Flash itself, really. I've had two consumer SSDs and at least one SD fail completely for other reasons; they became completely un-usable, not just un-writable. In the case of the SSDs at least, I was told it was due to internal controller failure, meaning the NAND itself was fine but the circuits to control and access it were trashed. I suppose a platter-drive analog to that would be having the platters in mint condition with all data intact bu

    • by tlhIngan ( 30335 )

      This experiment only documents the survivability of the NAND Flash itself, really. I've had two consumer SSDs and at least one SD fail completely for other reasons; they became completely un-usable, not just un-writable. In the case of the SSDs at least, I was told it was due to internal controller failure, meaning the NAND itself was fine but the circuits to control and access it were trashed. I suppose a platter-drive analog to that would be having the platters in mint condition with all data intact but t

  • Suddenly a bunch of SSD drives of all types and manufacturers have shown up on ebay. Coincidence? I think not.
  • Buy a Corsair's Neutron SSD

Top Ten Things Overheard At The ANSI C Draft Committee Meetings: (3) Ha, ha, I can't believe they're actually going to adopt this sucker.

Working...