Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Data Storage Hardware Hacking Media Hardware Build

Flash Destroyer Tests Limit of Solid State Storage 229

An anonymous reader writes "We all know that flash and other types of solid state storage can only endure a limited number of write cycles. The open source Flash Destroyer prototype explores that limit by writing and verifying a solid state storage chip until it dies. The total write-verify cycle count is shown on a display — watch a live video feed and guess when the first chip will die. This project was inspired by the inevitable comments about flash longevity on every Slashdot SSD story. Design files and source are available at Google Code."
This discussion has been archived. No new comments can be posted.

Flash Destroyer Tests Limit of Solid State Storage

Comments Filter:
  • Interesting! (Score:4, Interesting)

    by exasperation ( 1378979 ) on Thursday May 27, 2010 @04:04PM (#32367354)
    It'll be nice to get some third-party data on exactly how long these things last on average.
    • Re:Interesting! (Score:5, Informative)

      by mantis2009 ( 1557343 ) on Thursday May 27, 2010 @04:16PM (#32367538)
      Just checked out the video feed. The chip already lasted longer than 1 million writes, which is the number of writes the chip is supposed to last over its lifetime. As of this writing, the chip has survived more than 1,600,000 write cycles and counting.

      Still, since this test isn't on an actual, shipping solid state drive (SSD) product, the results will be discounted by a lot of critics.
      • The rated write cycles may be at the extreme of the rated operating temperature. He needs to bake the test rig in an oven for the duration of the test.
      • Re:Interesting! (Score:4, Insightful)

        by msauve ( 701917 ) on Thursday May 27, 2010 @04:38PM (#32367918)

        since this test isn't on an actual, shipping solid state drive (SSD) product, the results will be discounted by a lot of critics.

        Assuming that the flash is of equivalent technology (e.g. SLC NAND, cell size, etc) to that used for SSD, then this would present a best case test, since it is exercising all cells equally.

        An SSD tries to do wear leveling (distribute writes evenly), but that can't done perfectly, as is done in this test.

      • Re:Interesting! (Score:5, Insightful)

        by Dancindan84 ( 1056246 ) on Thursday May 27, 2010 @04:40PM (#32367954)
        And honestly it's a pretty valid argument. This is definitely going to be informative, but I'm just as interested in how a particular SSD handles the flash blocks failing as when they fail. A SSD with flash that averages 1,000,000 writes before blocks start to fail but does it gracefully with little/no data loss could be better than one that averages 2,000,000 but goes out in a blaze of glory as soon as the first block fails.
        • by Chris Burke ( 6130 ) on Thursday May 27, 2010 @05:20PM (#32368528) Homepage

          A SSD with flash that averages 1,000,000 writes before blocks start to fail but does it gracefully with little/no data loss could be better than one that averages 2,000,000 but goes out in a blaze of glory as soon as the first block fails.

          That depends on how you define "better", and for my personal definition, it depends on exactly how glorious a blaze it is. :)

          • by gyrogeerloose ( 849181 ) on Thursday May 27, 2010 @06:03PM (#32369084) Journal

            That depends on how you define "better", and for my personal definition, it depends on exactly how glorious a blaze it is. :)

            Really. Don't all of us Slashdotters love a good explosion? Sure, we mostly prefer them to be scheduled explosions but, still, an explosion is an explosion.

            • by Bing Tsher E ( 943915 ) on Thursday May 27, 2010 @10:45PM (#32371500) Journal

              That brings to mind an old favorite of mine: the Light Emitting EPROM. The power pins on EPROM chips are in opposite corners. Plug in the EPROM chip backwards and you've hooked the power up backwards. Result: A light emitting EPROM, though one with a very limited service life.

        • Re:Interesting! (Score:4, Interesting)

          by Jah-Wren Ryel ( 80510 ) on Thursday May 27, 2010 @05:26PM (#32368634)

          And honestly it's a pretty valid argument. This is definitely going to be informative, but I'm just as interested in how a particular SSD handles the flash blocks failing as when they fail. A SSD with flash that averages 1,000,000 writes before blocks start to fail but does it gracefully with little/no data loss could be better than one that averages 2,000,000 but goes out in a blaze of glory as soon as the first block fails.

          Flash fails on write - if the write succeeds, you will be able to read it baring catastrophic events like ESD exposure.

      • Re:Interesting! (Score:5, Informative)

        by Kindgott ( 165758 ) <soulwound AT godisdead DOT com> on Thursday May 27, 2010 @04:42PM (#32367986) Journal

        Yeah, the title seems misleading, since they're writing and verifying data on an EEPROM, which is not used in solid state drives last time I checked.

      • Re:Interesting! (Score:4, Interesting)

        by blahplusplus ( 757119 ) on Thursday May 27, 2010 @07:30PM (#32370018)

        One should not forget companies might have "chip lotteries", i.e. use chips that are less robust and cheaper to manufacture without majority of consumers knowing the difference.

        They do this in the LCD monitor industry where they have "panel lotteries" that use cheaper parts and are not what is advertised due to consumer ignorance. See Article on Anand here about panel lotteries:

        http://forums.anandtech.com/showthread.php?t=39226 [anandtech.com]

    • Re:Interesting! (Score:5, Interesting)

      by jellomizer ( 103300 ) on Thursday May 27, 2010 @04:36PM (#32367882)

      I would like to see a comparison with a mechanical drive doing the same thing in parallel.

      While the Solid Sate has a theoretical Limited number of writes vs. the mechanical drive, it would be interesting to see what real world has to offer.

    • I vaguely recall reading that the more writes flash has, the less likely it is to remember what is written to it over time - kind of like volatile storage, but with the length of time the data lasts being inversely related to the number of writes.

      Given what I know about flash, I'm not quite sure how this could happen physically. I believe this was mentioned when I was looking into ssd caches for zfs, where this type of failure would be insignificant. It could be completely incorrect, too.

      If it is correct,

    • Re:Interesting! (Score:5, Informative)

      by lauragrupp ( 1820940 ) on Thursday May 27, 2010 @05:33PM (#32368722)
      Here is work from the academic community exploring error rates, latencies and some other factors. It compares 11 NAND flash chips (both SLC and MLC) from 5 manufacturers: http://nvsl.ucsd.edu/ftest.html [ucsd.edu]
  • live stream (Score:5, Funny)

    by Anonymous Coward on Thursday May 27, 2010 @04:07PM (#32367392)

    a live stream linked on slashdot.. ouch..

    • Re: (Score:3, Insightful)

      by biryokumaru ( 822262 )

      They should have a bit torrent-like system for streams. Like, you just connect to the swarm and request a fairly recent image. Everyone keeps the past minute or so cached to send to new people in the swarm. Maybe a tiered system so that the people who have been connected longest are closest to the original stream.

      Let's say I connect to Joe and Mary, who're connected to the original server. They send me frames two or three frames behind the server. Jack connects, and he's getting a bit lagged images too, rig

    • Which ironically uses Flash.
    • Re: (Score:3, Interesting)

      by game kid ( 805301 )

      Doesn't multicast help any? Given a bunch of people who want to view the same exact stream, the server should be sending the same packets and letting the viewers' players deal with sync, starting at a key frame (and not in the middle of some crumbly diff frames), et cetera. With that, the server could just concentrate on the list of viewers' IPs, send packets far less often, and the /. arson fails.

      Live streams, to me, seem easier than webpages because the viewer always wants the current frames of a live v

    • by SEWilco ( 27983 )
      The webcam destroyer has only been running for 3 days so far. Keep an eye on it.
  • You would think after the write cycles were exceeded the chips would be more or less read-only instead of 'dead.'

    Am I mistaken on this presumption?

    • You would think after the write cycles were exceeded the chips would be more or less read-only instead of 'dead.'

      Am I mistaken on this presumption?

      Yep. When it dies, you can still write. It is just what you write won't be right. :) Hence the verify part of the test.

      • Re: (Score:3, Informative)

        by ledow ( 319597 )

        Depends - if the chips are using some sort of error correction, they may well just fail. I have USB-based Flash die all the time and it DIES, as in not even presenting a usable device to the OS despite being "detected". The theory is that they fail nicely but the chances are that any non-premium flash will just die a death. Why bother making the device fail gracefully if it's failed anyway?

        Literally - I've never seen a flash device in such a "read-only" mode, even for a single bit, but I can't even begin

        • Re: (Score:3, Insightful)

          by fbjon ( 692006 )
          It may be that the controller on the device just doesn't know what to do when something goes pear-shaped. To be sure, you should be accessing the raw NAND chip itself.
    • by Entrope ( 68843 )

      It depends on the architecture of the flash cells, but yes, I would expect that the chips would fail into some mode where erase and program operations have no effect. (Being a software guy rather than a Flash memory guy, I wouldn't want to guess whether over-erased cells would be at logic 1, logic 0 or a mix of the two.)

      • Re:Die? (Score:4, Informative)

        by Chris Burke ( 6130 ) on Thursday May 27, 2010 @05:05PM (#32368294) Homepage

        (Being a software guy rather than a Flash memory guy, I wouldn't want to guess whether over-erased cells would be at logic 1, logic 0 or a mix of the two.)

        Well I'm not an expert on flash, but I know a little about how they work. In NOR flash the data line is pulled up to one, so that's the default state for any bit. There's a transistor connected to ground, and if the floating gate has a charge in it and the transistor is on, then it pulls the data line down to 0. "Erasing" a NOR flash sets all the bits to 1, and programming it sets certain bits to 0.

        The most common failure mode as I understand it is that electrons get trapped in the floating gate even after erase cycles such that it's very close to or over Vt for the transistor, so that bit would be stuck in the "programmed" state of logic 0.

        NAND memory is the opposite, the erased state is 0 and the programmed state is 1, so a permanently charged floating gate should result in a stuck-at-1 fault.

        Which, relating to the OP's question, means either way the memory wouldn't be good for much of anything. Your NAND SSD is going to fail during an erase-program (aka "write") cycle, and except in the extremely unlikely case that the pattern you were writing did not involve changing any previously stored 1s to 0s on stuck bits, then the result is going to be wrong. You could read it, but you'd be reading the wrong data.

        • Re: (Score:3, Informative)

          by AdamHaun ( 43173 )

          Your description is a bit backwards, at least for the NOR flash I work on. When the floating gate has charge (electrons), it turns the transistor off. The negative charge on the FG cancels out the positive voltage on the control gate. The bit is read via a current sense -- no current is a zero, lots of current is a one.

          The main failure mechanism (that I know of) is oxide damage due to high energy electrons. Program and erase (technically, Fowler-Nordheim tunneling) take high voltages, which gives electrons

    • by msauve ( 701917 )
      If it fails on a write, then the data written is useless (because some random bit/s will be wrong), so the storage is "dead" in that it is no longer useful. IOW, it can no longer be used for its intended purpose.

      "Read-only" refers to storage which contains useful information, in that it was written once with the desired data, even if it can't be again (ROM or PROM). So even though it's read-only, it still fulfills its intended purpose.

      In any case, read-only = useful, dead = not useful; worn out flash
      • Right, but failing gracefully into a "no more writes" state is far better than an "I'm dead and I took your data with me" scenario.

        I honestly don't know which is more common or if it varies amongst various flash storage devices, hence why I raised the question.

        People in general should but don't have backups of their data so this distinction is pretty important.

        • by msauve ( 701917 )
          Since you don't know it's failed until you have an unsuccessful write, what "graceful" mechanism are you proposing?
          • Re:Huh? (Score:5, Insightful)

            by Denis Lemire ( 27713 ) on Thursday May 27, 2010 @05:17PM (#32368480) Homepage

            Graceful as in data not related to your recent failed writes are still readable so they can be backed up and migrated to a new drive. Not sure why that concept is so difficult. I consider something dead as "completely unreadable, ALL your data has been destroyed - have a nice day."

            No longer reliable but still semi recoverable isn't quite "dead."

            Maybe I'm just using a stricter interpretation of the word dead than you are?

            Let's use a marker on a white board analogy. If I was storing all my data on a suitably large white board using a marker and I completely exhausted my marker's supply of ink, I'd be pissed if this resulted in a blank whiteboard, wouldn't you? On that same note, if I wiped a small section of my whiteboard with the intent of writing something new in that area and only then realized that my marker was no longer suitably supplied with ink and my write failed, I would find the blank void in that section alone acceptable.

            Does that clarify things?

  • by Anonymous Coward on Thursday May 27, 2010 @04:10PM (#32367446)

    Flash! Aa-aaahhh!!

  • for the guy (Score:3, Insightful)

    by phantomfive ( 622387 ) on Thursday May 27, 2010 @04:10PM (#32367452) Journal
    For the guy a couple days back who asked what kind of project can he do that would be useful to the world, here is a great example. Try something like this.
    • by houstonbofh ( 602064 ) on Thursday May 27, 2010 @04:19PM (#32367588)
      The fact that you said this shows you spend way to much time on slashdot. The fact that I recognized it, and was one of the first posters in the thread you refer to says the same about me. I wonder if I can find a life for sale on craigslist?

      Link to thread in question...
      http://ask.slashdot.org/story/10/05/23/1547202/Scientific-RampD-At-Home
      • Here, take mine, I don't need it anymore, apparently. I recognized the reference too.


        Just kidding. I don't have one either.

  • by Anonymous Coward on Thursday May 27, 2010 @04:11PM (#32367466)

    Wait, which flash are we talking about here?

  • Excellent work! Given that the chance that the manufacturers will provide this data approaches zero, this is the only way we're going to get realistic figures for the longevity of flash chips. Hopefully, this will encourage more independent hardware testing in other fields

  • dull (Score:5, Funny)

    by Threni ( 635302 ) on Thursday May 27, 2010 @04:14PM (#32367502)

    I was expecting something cool, like storing a picture, displaying it, and then constantly XORing each pixel with some random number twice, repeatedly, and watching the image decay over time. Although it would appear that it'd need quite a lot of time.

  • Ha! (Score:3, Funny)

    by BJ_Covert_Action ( 1499847 ) on Thursday May 27, 2010 @04:14PM (#32367510) Homepage Journal

    This project was inspired by the inevitable comments about flash longevity on every Slashdot SSD story.

    Take that every 'dotter that says bitching on this website doesn't get anything done!

    /removestonguefromcheek

  • SSD's? no. (Score:5, Informative)

    by hypethetica ( 739528 ) on Thursday May 27, 2010 @04:16PM (#32367534) Homepage

    article says: We used a Microchip 24AA01-I/P 128byte I2C EEPROM (IC2), rated for 1million write cycles.

    Um, SSDs don't use anything like this part as their storage.

    • by Kjella ( 173770 )

      Yeah, that's what I was wondering too the moment I saw the 1 million cycles... what I heard was that SLC is usually rated for ~100k writes and MLC for ~10k writes, so completely different type of chip. So I'm not sure what this data will be relevant for, but it's not SSDs... what's this for, BIOS chips or something?

      • Re:SSD's? no. (Score:5, Informative)

        by Anonymous Coward on Thursday May 27, 2010 @04:34PM (#32367868)

        More importantly, the test pattern does not resemble normal SSD usage. Complete writes are very unusual for SSD and a cycle is not completed nearly as quickly as a cycle on this EEPROM (400 cycles per minute). When an SSD is written to in normal usage, a wear leveling algorithm distributes the data and avoids writing to the same physical blocks again and again. The German computer magazine C't has run continuous write tests with USB sticks and never managed to destroy even a single visible block on a stick that way. The first test (4 years ago) wrote the same block more than 16 million times before they gave up. The second test (2 years ago) wrote the full capacity over and over again. The 2GB stick did not show any signs of wear after more than 23TB written to it.

      • by tlhIngan ( 30335 )

        Yeah, that's what I was wondering too the moment I saw the 1 million cycles... what I heard was that SLC is usually rated for ~100k writes and MLC for ~10k writes, so completely different type of chip. So I'm not sure what this data will be relevant for, but it's not SSDs... what's this for, BIOS chips or something?

        Oddly, the NAND I deal with (MLC and SLC) tend to have ~1M writes for SLC, and at least 100k writes for MLC. The 10k flash chips I used were high capacity Intel Strataflash (MLC, but NOR), which

    • I'd like to know what universe you get your SSDs from that don't use EEPROMs. Oh, you think the size is a big deal? Let me introduce you to a wild concept known as 'scaling'.
      • by Yvan256 ( 722131 )

        You mean, like reptiles?

      • Re: (Score:3, Informative)

        Actually, I rescind my post, as I realize I was confusing EEPROM with NOR/NAND. Your point is actually quite valid.
      • Re:SSD's? no. (Score:5, Informative)

        by robot256 ( 1635039 ) on Thursday May 27, 2010 @04:39PM (#32367936)

        Okay, I'll bite. Let me introduce you to this thing called "functional equivalence". You do realize that even though they are all "nonvolatile storage," there is a difference between EEPROM and Flash, and that there are many different kinds of low- and high-density Flash and they all have different proprietary silicon designs with different characteristics?

        Microchip EEPROMs are specifically designed for low-density, high-reliability applications, and are totally different at the transistor level from high-density MLC Flash used in solid state disks.

  • by bluestar ( 17362 ) on Thursday May 27, 2010 @04:16PM (#32367552) Homepage

    I bet the server's IP address is untraceable.

  • by Rick Richardson ( 87058 ) on Thursday May 27, 2010 @04:21PM (#32367636) Homepage

    The display only goes to 9,999,999! I think that won't be enuf... should be 100M or 1G.

  • by PSaltyDS ( 467134 ) on Thursday May 27, 2010 @04:23PM (#32367664) Journal

    Now, to see how much explosives it takes to MAKE it fail!

    This is my favorite part! :-)

  • That Castrol commercial with 50 engines running on engine stands with no oil in them?

  • by Edmund Blackadder ( 559735 ) on Thursday May 27, 2010 @04:32PM (#32367828)

    Most modern flash memories have their controllers check which blocks are dying or dead and re-route write and read requests to good blocks. So while your flash may seem to be working perfectly well, various blocks inside it may be dying and its storage size may be progressively decreasing.

    So I hope they are rewriting the entire flash in their test. Otherwise it is not representative.

    • Rewriting an entire SSD is time-prohibitive. It would take several minutes to write 32GB to the fastest 32GB SSD, now multiply that by the 10,000 write cycles claimed by the manufacturers... 20,000 minutes or so at a minimum... and thats without verifying the write.. so add in another 15,000 minutes or so...

      In other words, it will probably take about a month to intentionally brute-force a full-bandwidth-kill of a 32GB SSD. Larger SSD's would take proportionally longer.
    • Re: (Score:3, Insightful)

      by fnj ( 64210 )

      Nonsense, it's completely representative of normal use. That's exactly the point. Until data loss occurs, or there are no more free blocks to use, the flash memory is objectively perfectly good.

    • by blind biker ( 1066130 ) on Thursday May 27, 2010 @04:57PM (#32368190) Journal

      They're testing an EEPROM: it is bit addressable and it does not contain any wear leveling algorithm.

  • Apples and hippos (Score:5, Informative)

    by blind biker ( 1066130 ) on Thursday May 27, 2010 @04:53PM (#32368148) Journal

    They're testing an EEPROM: while the underlining physics of storing data in an EEPROM and Flash RAM are the same - floating gate transistors - EEPROMs use best-of-breed implementations, single-bit addressable floating gate, while the Flash RAM found in SSDs is the cheapest, lest enduring MLC NAND. MLC NAND are the cheapest per bit, and have a write cycle endurance of two to three orders of magnitude lower than EEPROMs.

    SSDs do not contain EEPROMs. They don't even contain SLC (NOR or NAND). In fact, SSDs don't even contain NOR MLCs. Only the cheapest will do, for SSDs.

  • Never heard of him. Is he a Marvel Universe character?

  • I wonder... (Score:4, Funny)

    by bynary ( 827120 ) on Thursday May 27, 2010 @07:30PM (#32370024) Homepage
    ...will the Flash Destroyer hold up under this load?
  • by gweihir ( 88907 ) on Thursday May 27, 2010 @09:11PM (#32370926)

    This is what I got from a 2GB Kingston Flash Key. After that there were errors in almost all overwrites. However the real kicker is that while the key read back wrong data, there never ever was any error reported. Since doing that beginning of 2009, I do not rust USB Flash anymore.

    Set-up: Linux, 1MB random data replicated to fill the chip, then read back to compare. Repeat with new random data. I had one isolated faulty read-back around 3500 cycles and then from arounf 3700 cycles 90% (and pretty soon 100%) faulty read-backs. Language was Python, no errors for the device on STDERR, or the systemlogs. And I looked carefully.

  • This is a bad test (Score:5, Informative)

    by AdamHaun ( 43173 ) on Friday May 28, 2010 @12:49AM (#32372214) Journal

    I am working on flash write/erase cycling right now in my day job and I can tell you that this is not a very good test. Temperature affects cycling endurance (and this is reflected in the spec), so if your SSD is 20-30C higher than room temp it's going to make a difference. Fowler-Nordheim tunneling (which NAND flash uses for program and erase) is hardest at cold temperatures, so the first operation after powerup might be the worst case in a PC. (Yes, I know they're not using an SSD here, but they are doing their cycling at room temp.)

    Another thing to keep in mind is that continuous cycling is not realistic. The wear-out mechanism here is charge trap-up, where electrons get stuck in the floating gate oxide and repel other electrons, slowing down program and erase. Over time, thermal energy lets the electrons detrap. So irregular usage in a hot PC should actually be nicer environment for endurance.

    A final factor is process variation, which can only be covered by using a large sample size (>100) and/or using units from separate lots with known characteristics, none of which an end user will likely have access to. Even that doesn't tell you anything about the defect rate.

    There are really two types of tests that people are talking about here. The first is a spec compliance test, which uses the extreme conditions I mentioned above to guarantee that all units will have the spec endurance under all spec conditions. This should be done by the manufacturer. The second is a real world usage test, which will only give realistic results if done under actual use conditions. The number you get from the article's test probably won't tell you much.

    [Disclaimer: I work on embedded NOR flash, not NAND, but the bits are the same and the article's talking about EEPROM so I figure I can butt in.]

The Tao is like a glob pattern: used but never used up. It is like the extern void: filled with infinite possibilities.

Working...