Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Data Storage Hardware Hacking Media Hardware Build

Flash Destroyer Tests Limit of Solid State Storage 229

An anonymous reader writes "We all know that flash and other types of solid state storage can only endure a limited number of write cycles. The open source Flash Destroyer prototype explores that limit by writing and verifying a solid state storage chip until it dies. The total write-verify cycle count is shown on a display — watch a live video feed and guess when the first chip will die. This project was inspired by the inevitable comments about flash longevity on every Slashdot SSD story. Design files and source are available at Google Code."
This discussion has been archived. No new comments can be posted.

Flash Destroyer Tests Limit of Solid State Storage

Comments Filter:
  • Re:Subject here (Score:1, Informative)

    by Anonymous Coward on Thursday May 27, 2010 @04:15PM (#32367514)

    He's a miracle!!!

  • SSD's? no. (Score:5, Informative)

    by hypethetica ( 739528 ) on Thursday May 27, 2010 @04:16PM (#32367534) Homepage

    article says: We used a Microchip 24AA01-I/P 128byte I2C EEPROM (IC2), rated for 1million write cycles.

    Um, SSDs don't use anything like this part as their storage.

  • Re:Interesting! (Score:5, Informative)

    by mantis2009 ( 1557343 ) on Thursday May 27, 2010 @04:16PM (#32367538)
    Just checked out the video feed. The chip already lasted longer than 1 million writes, which is the number of writes the chip is supposed to last over its lifetime. As of this writing, the chip has survived more than 1,600,000 write cycles and counting.

    Still, since this test isn't on an actual, shipping solid state drive (SSD) product, the results will be discounted by a lot of critics.
  • by Monkeedude1212 ( 1560403 ) on Thursday May 27, 2010 @04:19PM (#32367604) Journal

    Looking back on it, that was a pretty bad movie.

  • Re:Die? (Score:3, Informative)

    by ledow ( 319597 ) on Thursday May 27, 2010 @04:23PM (#32367658) Homepage

    Depends - if the chips are using some sort of error correction, they may well just fail. I have USB-based Flash die all the time and it DIES, as in not even presenting a usable device to the OS despite being "detected". The theory is that they fail nicely but the chances are that any non-premium flash will just die a death. Why bother making the device fail gracefully if it's failed anyway?

    Literally - I've never seen a flash device in such a "read-only" mode, even for a single bit, but I can't even begin to count the number of flash-chips in certain devices (everything from routers to USB sticks) that just die for no reason and never recover.

  • Re:SSD's? no. (Score:3, Informative)

    by ElectricTurtle ( 1171201 ) on Thursday May 27, 2010 @04:28PM (#32367746)
    Oh bleh... AC box checked accidentally. The parent is me.
  • Re:live stream (Score:5, Informative)

    by kipin ( 981566 ) on Thursday May 27, 2010 @04:30PM (#32367790) Homepage
    http://torrentstream.org/ [torrentstream.org]

    Works pretty well actually.
  • Re:live stream (Score:5, Informative)

    by TooMuchToDo ( 882796 ) on Thursday May 27, 2010 @04:31PM (#32367810)
    You've just described what multicast was designed to solve.

    https://www.cisco.com/en/US/products/ps6552/products_ios_technology_home.html [cisco.com]

  • Re:SSD's? no. (Score:3, Informative)

    by ElectricTurtle ( 1171201 ) on Thursday May 27, 2010 @04:32PM (#32367846)
    Actually, I rescind my post, as I realize I was confusing EEPROM with NOR/NAND. Your point is actually quite valid.
  • Re:SSD's? no. (Score:5, Informative)

    by Anonymous Coward on Thursday May 27, 2010 @04:34PM (#32367868)

    More importantly, the test pattern does not resemble normal SSD usage. Complete writes are very unusual for SSD and a cycle is not completed nearly as quickly as a cycle on this EEPROM (400 cycles per minute). When an SSD is written to in normal usage, a wear leveling algorithm distributes the data and avoids writing to the same physical blocks again and again. The German computer magazine C't has run continuous write tests with USB sticks and never managed to destroy even a single visible block on a stick that way. The first test (4 years ago) wrote the same block more than 16 million times before they gave up. The second test (2 years ago) wrote the full capacity over and over again. The 2GB stick did not show any signs of wear after more than 23TB written to it.

  • Re:SSD's? no. (Score:5, Informative)

    by robot256 ( 1635039 ) on Thursday May 27, 2010 @04:39PM (#32367936)

    Okay, I'll bite. Let me introduce you to this thing called "functional equivalence". You do realize that even though they are all "nonvolatile storage," there is a difference between EEPROM and Flash, and that there are many different kinds of low- and high-density Flash and they all have different proprietary silicon designs with different characteristics?

    Microchip EEPROMs are specifically designed for low-density, high-reliability applications, and are totally different at the transistor level from high-density MLC Flash used in solid state disks.

  • Re:Interesting! (Score:5, Informative)

    by Kindgott ( 165758 ) <soulwound@god[ ]ead.com ['isd' in gap]> on Thursday May 27, 2010 @04:42PM (#32367986) Journal

    Yeah, the title seems misleading, since they're writing and verifying data on an EEPROM, which is not used in solid state drives last time I checked.

  • Re:Interesting! (Score:5, Informative)

    by InsaneProcessor ( 869563 ) on Thursday May 27, 2010 @04:52PM (#32368120)
    I find this "not very interesting" RTFA. This is not a flash destroyer. It is an EEPROM destroyer. NOT THE SAME THING AND NOT USEFUL!
  • Apples and hippos (Score:5, Informative)

    by blind biker ( 1066130 ) on Thursday May 27, 2010 @04:53PM (#32368148) Journal

    They're testing an EEPROM: while the underlining physics of storing data in an EEPROM and Flash RAM are the same - floating gate transistors - EEPROMs use best-of-breed implementations, single-bit addressable floating gate, while the Flash RAM found in SSDs is the cheapest, lest enduring MLC NAND. MLC NAND are the cheapest per bit, and have a write cycle endurance of two to three orders of magnitude lower than EEPROMs.

    SSDs do not contain EEPROMs. They don't even contain SLC (NOR or NAND). In fact, SSDs don't even contain NOR MLCs. Only the cheapest will do, for SSDs.

  • Re:Subject here (Score:3, Informative)

    by Rockoon ( 1252108 ) on Thursday May 27, 2010 @04:55PM (#32368164)
    King of the impossible!
  • Re:Interesting! (Score:1, Informative)

    by Anonymous Coward on Thursday May 27, 2010 @04:57PM (#32368184)

    You're wrong, twice. Almost all consumer SSDs use MLC (eg the X-25, nearly all products based on the Indilinx or Sandforce controllers). Also SSDs ship with a few gigabytes of extra space for wear leveling; available sectors just get lined up neatly in a queue (according to whoever has had the least write cycles so far) and the controller picks the sector at the top of the list. The absolute worst-case scenario is that someone will fill up the entire drive and then write to the same spot over and over again; even in this extreme corner case, wear-leveling algorithms are "perfect" in that they will distribute the writes 100% evenly across all available spare sectors until one of them dies (at which point it *should* repeat the process with the remaining sectors, until they're all gone too).

  • by blind biker ( 1066130 ) on Thursday May 27, 2010 @04:57PM (#32368190) Journal

    They're testing an EEPROM: it is bit addressable and it does not contain any wear leveling algorithm.

  • Re:Die? (Score:4, Informative)

    by Chris Burke ( 6130 ) on Thursday May 27, 2010 @05:05PM (#32368294) Homepage

    (Being a software guy rather than a Flash memory guy, I wouldn't want to guess whether over-erased cells would be at logic 1, logic 0 or a mix of the two.)

    Well I'm not an expert on flash, but I know a little about how they work. In NOR flash the data line is pulled up to one, so that's the default state for any bit. There's a transistor connected to ground, and if the floating gate has a charge in it and the transistor is on, then it pulls the data line down to 0. "Erasing" a NOR flash sets all the bits to 1, and programming it sets certain bits to 0.

    The most common failure mode as I understand it is that electrons get trapped in the floating gate even after erase cycles such that it's very close to or over Vt for the transistor, so that bit would be stuck in the "programmed" state of logic 0.

    NAND memory is the opposite, the erased state is 0 and the programmed state is 1, so a permanently charged floating gate should result in a stuck-at-1 fault.

    Which, relating to the OP's question, means either way the memory wouldn't be good for much of anything. Your NAND SSD is going to fail during an erase-program (aka "write") cycle, and except in the extremely unlikely case that the pattern you were writing did not involve changing any previously stored 1s to 0s on stuck bits, then the result is going to be wrong. You could read it, but you'd be reading the wrong data.

  • Re:Apples and hippos (Score:3, Informative)

    by Microlith ( 54737 ) on Thursday May 27, 2010 @05:15PM (#32368446)

    They don't even contain SLC (NOR or NAND).

    Some, usually the more expensive models, will use SLC NAND. No SSD uses NOR for data storage due to a total lack of density on that technology. They may for storing firmware/FPGA data, however.

  • Re:Interesting! (Score:1, Informative)

    by msauve ( 701917 ) on Thursday May 27, 2010 @05:29PM (#32368684)
    So, I'm violating my usual rule of not responding to ACs, only because you're such an idiot (which conveniently explains why you are posting AC).

    "perfect" in that they will distribute the writes 100% evenly across all available spare sectors

    See, that's the thing. Once a sector is written to, it won't be touched again, unless the data changes. You end up with some subset of sectors which are frequently modified, while others never are. That is NOT an even distribution of writes across all sectors, nor is it "perfect" in any sense of the word.

    So, fill up 75% of your SSD with files which don't change, then beat up on the remaining sectors 4 times as much as truly evenly distributed writes would cause.

    It's not clear what you "MLC" comment was about, since I specifically mentioned that as an example of flash technology.

  • Re:Interesting! (Score:5, Informative)

    by lauragrupp ( 1820940 ) on Thursday May 27, 2010 @05:33PM (#32368722)
    Here is work from the academic community exploring error rates, latencies and some other factors. It compares 11 NAND flash chips (both SLC and MLC) from 5 manufacturers: http://nvsl.ucsd.edu/ftest.html [ucsd.edu]
  • Re:Interesting! (Score:3, Informative)

    by dave420 ( 699308 ) on Thursday May 27, 2010 @06:05PM (#32369100)
    The chip in question is completely different in tolerances, performance, and life-span of the chips used in SSDs. That's the problem.
  • Re:Interesting! (Score:4, Informative)

    by Simetrical ( 1047518 ) <Simetrical+sd@gmail.com> on Thursday May 27, 2010 @06:19PM (#32369250) Homepage

    So, I'm violating my usual rule of not responding to ACs, only because you're such an idiot (which conveniently explains why you are posting AC).

    "perfect" in that they will distribute the writes 100% evenly across all available spare sectors

    See, that's the thing. Once a sector is written to, it won't be touched again, unless the data changes. You end up with some subset of sectors which are frequently modified, while others never are. That is NOT an even distribution of writes across all sectors, nor is it "perfect" in any sense of the word. So, fill up 75% of your SSD with files which don't change, then beat up on the remaining sectors 4 times as much as truly evenly distributed writes would cause. It's not clear what you "MLC" comment was about, since I specifically mentioned that as an example of flash technology.

    So keep track of how many times each erase block has been written, and if some blocks get erased too often relative to the rest, move data from the least-erased blocks onto the most-erased blocks. You do a few extra writes this way, but a negligible number if you set the thresholds high enough. And then you'll get fully leveled writes. I'm sure the clever folks at places like Intel have figured out strategies like this (although for the cheap stuff, who knows).

  • by Anonymous Coward on Thursday May 27, 2010 @06:36PM (#32369428)

    A bit larger scale in some cases. My ioxtreme, for example, has 99GB of flash but only 80GB visible. So roughly 25% spares.

    And it tells me the percentage of spares left.

  • Re:Interesting! (Score:2, Informative)

    by srvivn21 ( 410280 ) on Thursday May 27, 2010 @06:45PM (#32369512)

    Not the original AC, but I thought I would try to clear up a disconnect instead of downmodding...

    So, I'm violating my usual rule of not responding to ACs, only because you're such an idiot (which conveniently explains why you are posting AC).

    -1 Flamebait. As I'll show, the rest of your rant has insufficient content to balance this.

    "perfect" in that they will distribute the writes 100% evenly across all available spare sectors

    Emphasis mine.

    See, that's the thing. Once a sector is written to, it won't be touched again, unless the data changes. You end up with some subset of sectors

    The spare ones, as the AC pointed out.

    which are frequently modified, while others never are. That is NOT an even distribution of writes across all sectors,

    Not a claim made by the AC.

    nor is it "perfect" in any sense of the word.

    Strictly your opinion.

    So, fill up 75% of your SSD with files which don't change, then beat up on the remaining sectors 4 times as much as truly evenly distributed writes would cause.

    The AC actually posited a worse case scenario, in that the whole disk was filled, and only one "spot" was repeatedly changed.

    It's not clear what you "MLC" comment was about, since I specifically mentioned that as an example of flash technology.

    Sorry mate, your original comment [slashdot.org] made mention of SLC, not MLC. While it's not clear what the AC was harping about (as you didn't make a claim regarding the type of flash used by retail SSD's) calling the AC names without comprehending what was actually written is not conclusive to a rational discussion. I can only hope I'm not feeding a troll.

  • Re:Interesting! (Score:5, Informative)

    by networkBoy ( 774728 ) on Thursday May 27, 2010 @07:07PM (#32369780) Journal

    And in fact, the more advanced wear leveling algorithms do this already. There are spare blocks specifically such that the data can be moved, then the old block that was not used can be freed.

  • Re:Interesting! (Score:5, Informative)

    by networkBoy ( 774728 ) on Thursday May 27, 2010 @07:11PM (#32369820) Journal

    In fact, they are read back. At the flash component level.

    The flash cell is a charged gate. when programmed the uC in the flash device compares the charge state with a reference voltage. Not enough? Add more charge. Still not enough? Cell is bad, mark it (block level, so you lose xx bits for one bad one) and move on.

    This is fairly high level and not exactly how it works, but close enough.

  • Re:Interesting! (Score:5, Informative)

    by Anonymous Coward on Thursday May 27, 2010 @07:39PM (#32370120)

    Actually, I believe *you* are incorrect. Different AC here, but I had to respond because your response doesn't match what I understand to be the case as an engineer working with vendors selecting NAND flash for use in consumer devices. I'll be interested to see if I'm incorrect or if this even gets read as an AC post.

    Specifically, it doesn't matter to the flash device if the host has written a sector and never touched it again, that sector *will* be moved when it's been read enough times that the ECC indicates it's likely to become unreadable soon. This is called read disturbance and it can happen surprisingly frequently with MLC cells in small process sizes (i.e. at sufficient density to make multi-GB modules). It also happens on SLC devices but to a lesser extent because they can cope with more voltage decay per bit and still be able to read the bit correctly. This is done as a function of even the simplest block-access controllers because otherwise you wouldn't be able to read your own data back more than a few hundred times. In fact, if you wish to get technical about it, it also has a massive dependency upon the temperature the module is at when the data was originally written since this directly impacts the amount of electrons which can be stored.

    In addition to moving data to counter read disturbance, most controllers (even the very simple ones in SD Cards & eMMC devices) will move sectors (actually not filesystem sectors, but individual blocks although the distinction isn't important here) around in order to optimise wear across the entire device even if the content hasn't changed. If you think about it, this has to happen at some level even without wear levelling since the sector is massively smaller than the superblock size for most of the densities we have available today - it's not unusual to see a device with an erase block size of 256KB, which is normally way larger than a sector.

    I don't know much about SSD controllers, they're far too expensive for our devices, but they can't possibly work the way you think they do - not if they use the same raw NAND that is used for other block storage abstractions.

  • Re:Interesting! (Score:3, Informative)

    by Anonymous Coward on Thursday May 27, 2010 @07:40PM (#32370140)

    Informative? How about wrong!
    128seconds *1M operations = 128,000,000 seconds
    Seconds in a day = 86400
    128M/86400 = 1481.48 days

    Or roughly 4 years.

    For some reason, you divided 128M by the number of minutes in a day (1440) to arrive at your ludicrous 243years.
    Hence you are out by a factor of 60.

  • Re:Apples and hippos (Score:3, Informative)

    by curunir ( 98273 ) * on Thursday May 27, 2010 @09:06PM (#32370892) Homepage Journal

    Intel's Extreme line, for one. The X25-E [intel.com] goes up to 64GB. It's a 2.5" form factor, but it's a SATA drive and you can use a 3.5" bay with mounting rails to put it in a desktop.

    GP is right about being expensive...expect to pay over $600 for the 64GB model.

  • by gweihir ( 88907 ) on Thursday May 27, 2010 @09:11PM (#32370926)

    This is what I got from a 2GB Kingston Flash Key. After that there were errors in almost all overwrites. However the real kicker is that while the key read back wrong data, there never ever was any error reported. Since doing that beginning of 2009, I do not rust USB Flash anymore.

    Set-up: Linux, 1MB random data replicated to fill the chip, then read back to compare. Repeat with new random data. I had one isolated faulty read-back around 3500 cycles and then from arounf 3700 cycles 90% (and pretty soon 100%) faulty read-backs. Language was Python, no errors for the device on STDERR, or the systemlogs. And I looked carefully.

  • Re:Die? (Score:3, Informative)

    by AdamHaun ( 43173 ) on Friday May 28, 2010 @12:07AM (#32371994) Journal

    Your description is a bit backwards, at least for the NOR flash I work on. When the floating gate has charge (electrons), it turns the transistor off. The negative charge on the FG cancels out the positive voltage on the control gate. The bit is read via a current sense -- no current is a zero, lots of current is a one.

    The main failure mechanism (that I know of) is oxide damage due to high energy electrons. Program and erase (technically, Fowler-Nordheim tunneling) take high voltages, which gives electrons enough energy to scatter into the oxide and get trapped. This repels other electrons. So what happens is that it takes longer and longer to program and erase until eventually you exceed the set limit, at which point it shows up as a fail. The bit will be in an indeterminate state. It may read correctly but won't have enough margin to guarantee data retention.

  • Re:Interesting! (Score:5, Informative)

    by izomiac ( 815208 ) on Friday May 28, 2010 @12:24AM (#32372092) Homepage

    Cause wear leveling only picks another sector to write to from among the unused sectors. Simplified, if your drive is 80% full, you write to the same sectors five times as often.

    Especially because once blocks start failing, other blocks start failing too, at an accellerating rate, and they rapidly reach a state of being completely unusable.

    That's a contradiction. If the wear-leveling algorithm was ineffective then you'd have a relatively constant rate of block failure. A good wear-leveling algorithm ensures you won't get a significant number of block failures until almost every block has been worn out. Then you get a bunch. So the behavior described is failing exactly as intended, and indicates the wear-leveling algorithm worked almost perfectly.

    But you're right in that a wear algorithm that only uses free space would be terrible. That's one reason no device uses one like that. The primary reason though, is because the SSD has no idea which blocks are empty and which are free, unless it is told via the TRIM command (later generation SSDs with newer OSes). The filesystem knows, but an SSD is filesystem agnostic. Moving data is the cause behind the performance drop-off when the drive runs out of unused/un-TRIM'd blocks.

    Personally, I have the cheapest, buggiest SSD in common knowledge (the one that can get bogged down to 4 IOPS), and it has worked beautifully for me. Just checking a diagnostic tool, in the past two years I've power cycled it 5,666 times (which probably explains why I kill HDDs so quickly), the average block has been erased 7,333 times, and no block has been erased more than 7,442 times. I've got zero ECC failures. Honestly, I'm a little surprised I've written 234 TB of data to my poor 32 GB drive, but my usage is a bit heavy (~10 complete Gentoo compiles with countless updating, ~5 DISM'd Windows 7 installs, ~5 DISM'd Vista installs, ~30 Haiku installs, ~20 SVNs of 10 GB projects, and a good amount of downloading).

    But, in my experience, the wear leveling algorithm is only ~3% away from being "perfect".

  • This is a bad test (Score:5, Informative)

    by AdamHaun ( 43173 ) on Friday May 28, 2010 @12:49AM (#32372214) Journal

    I am working on flash write/erase cycling right now in my day job and I can tell you that this is not a very good test. Temperature affects cycling endurance (and this is reflected in the spec), so if your SSD is 20-30C higher than room temp it's going to make a difference. Fowler-Nordheim tunneling (which NAND flash uses for program and erase) is hardest at cold temperatures, so the first operation after powerup might be the worst case in a PC. (Yes, I know they're not using an SSD here, but they are doing their cycling at room temp.)

    Another thing to keep in mind is that continuous cycling is not realistic. The wear-out mechanism here is charge trap-up, where electrons get stuck in the floating gate oxide and repel other electrons, slowing down program and erase. Over time, thermal energy lets the electrons detrap. So irregular usage in a hot PC should actually be nicer environment for endurance.

    A final factor is process variation, which can only be covered by using a large sample size (>100) and/or using units from separate lots with known characteristics, none of which an end user will likely have access to. Even that doesn't tell you anything about the defect rate.

    There are really two types of tests that people are talking about here. The first is a spec compliance test, which uses the extreme conditions I mentioned above to guarantee that all units will have the spec endurance under all spec conditions. This should be done by the manufacturer. The second is a real world usage test, which will only give realistic results if done under actual use conditions. The number you get from the article's test probably won't tell you much.

    [Disclaimer: I work on embedded NOR flash, not NAND, but the bits are the same and the article's talking about EEPROM so I figure I can butt in.]

The rule on staying alive as a program manager is to give 'em a number or give 'em a date, but never give 'em both at once.

Working...