Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Data Storage Hardware

How Does Flash Media Fail? 357

bhodge writes "Aside from the obvious 'it stops working' answer, how does flash media — such as USB, SD, and CF — fail? Unlike with traditional hard drive, where anyone who's worked with computers for a while knows what a drive failure looks like, I don't know anyone who has experienced such a failure with flash. I've haven't been able to find more than scant evidence of what such failures look like at the OS level. The one account I have found detailed using a small USB drive for /var/log storage; it failed very quickly, and then utterly (0 byte unformatted device), after five years of service in the role. This runs contrary to other anecdotal claims that you should still be able to read the media after you can no longer write to it. So my question is: what have you seen of the nature of flash media failure, if anything?"
This discussion has been archived. No new comments can be posted.

How Does Flash Media Fail?

Comments Filter:
  • In my case (Score:4, Funny)

    by ptomblin ( 1378 ) <ptomblin@xcski.com> on Friday April 10, 2009 @09:09AM (#27530965) Homepage Journal

    It usually "fails" because it went through the washing machine in my pants too many times.

    • Re: (Score:3, Interesting)

      Comment removed based on user account deletion
      • Re: (Score:3, Informative)

        by aztracker1 ( 702135 )
        Actually, I've done that a couple times, and haven't had one not work after (then waiting a couple days)
        • Re:In my case (Score:4, Interesting)

          by roseblood ( 631824 ) on Friday April 10, 2009 @12:07PM (#27533555)

          Same here. I've washed SD and CF cards more often than I'd like to admit. Despite that I've never had one fail out of the wash.

          I've had one card fail though. My Palm Treo uses SD cards and once when removing the card my fingernail was in the perfect position to split the card open at the seam. When I removed the card from the phone I had 2 pieces. One plastic cover(the part w/o the label) and the remainder of the card stayed together. I re-attached to cover and attempted to read the card in several USB readers and the phone as well. It was dead. The devices never recognized that a card was inserted.

    • by MindlessAutomata ( 1282944 ) on Friday April 10, 2009 @09:11AM (#27530991)

      In that case, what's truly "failing" is you.

    • by Yvan256 ( 722131 ) on Friday April 10, 2009 @09:20AM (#27531151) Homepage Journal

      You have a washing machine in your pants?

    • Re: (Score:3, Interesting)

      by scorp1us ( 235526 )

      Why would a solid state device fail from multiple submergings? Especially if there is no current running through it during said submergings?

      • Comment removed (Score:5, Informative)

        by account_deleted ( 4530225 ) on Friday April 10, 2009 @09:24AM (#27531221)
        Comment removed based on user account deletion
      • Re:In my case (Score:5, Insightful)

        by AKAImBatman ( 238306 ) * <akaimbatman@ g m a i l . com> on Friday April 10, 2009 @09:27AM (#27531251) Homepage Journal

        Washing machines are pretty harsh places. You get tidal forces that will apply various physical stresses to the components. Rapid heating and cooling can cause expansion problems. Water can wear down contacts. Soaps can contaminate contacts or have negative chemical effects. So on and so forth.

        If it makes it to the drier, your card could easily end up at temperatures outside the optimal storage temperature for the device. (Ever read those warnings, "Store between 70F and 100F?" Yeah, me neither.) These extreme temperatures combined with the rapidity at which they're introduced is a cornucopia of ways your device could be damaged.

        In short, water isn't the real problem. It's all the stuff above and beyond that.

        • Re: (Score:2, Funny)

          ...Rapid heating and cooling can cause expansion problems.

          Thats what she said!
        • Re:In my case (Score:5, Insightful)

          by Bakkster ( 1529253 ) <Bakkster,man&gmail,com> on Friday April 10, 2009 @09:57AM (#27531743)
          Don't forget about the extreme static charges built up in a drier. Even though most USB devices have mechanisms to prevent static damage, a drier could overwhelm these protections. Regardless, usually a SSD failure should usually be due to the failure of the suport electronics, not the storage itself.
        • Re: (Score:3, Insightful)

          by klaun ( 236494 )

          Washing machines are pretty harsh places. You get tidal forces that will apply various physical stresses to the components. Rapid heating and cooling can cause expansion

          I'm sorry, tidal forces in a washing machine? Tidal forces are caused by gravity. It's an effect of the inverse square distance portion of the gravity force equation. They certainly exist in a washing machine as they do anywhere else subject to the effects of gravity, but no more so than anywhere else.

          Within the rotating frame of a was

        • Re:In my case (Score:5, Informative)

          by BitZtream ( 692029 ) on Friday April 10, 2009 @10:15AM (#27532005)

          You are only partially correct.

          The company I work for sells USB flash drives and going through a washer is rather common and they survive more often than not.

          The question is: Did you use soap?

          Water is practically harmless if you allow the device to dry completely before using it. The problem is water in washing machines isn't just water, its almost always water AND detergent, and probably some fabric softener as well.

          When the device dries, the detergent and fabric softener are left behind and are conductive, not like metal, but the resistance is low enough in the tiny spaces between the pins on surface mount chips to make all the difference in the world.

          The main reason devices fail however is simply abuse, or poor manufacturing depending on the device. Most of our returns are due to the USB connector pulling the solder pads off the circuit board because of the stresses during insertion/removal. Sometimes the pads don't come off at the USB connector but the board flexes enough to eventually break the connection at one of the flash chips or the controller. When that happens you go from working perfectly to 0 byte unformatted device in an instant as the controller can no longer talk to the actual flash.

          We have on occasion successfully retrieved customer data for them by removing the case from the device and flexing the board while its plugged in to get it to work or if that doesn't work, reflowing the solder where possible. Most of the time, thats all it takes.

          The heating and cooling is bad, but its not that bad. The temps in a dryer aren't as bad as one might think. My personal device has been washed and dried at least a dozen times in the last couple of years. When I find it in the dryer I simply pull it out of the case, clean the PCB with some PCB cleaner, let it dry, reassemble and life goes on. If its a good quality device doing it once will probably be okay, but as has been stated, doing it too many times and the heat expansion will certainly come into play and destroy solder joints or start making the board lamination fail.

          Now ... don't take that as a recommendation to wash your thumb drives, my stick is trying to get into the record books or something, I think it just refuses to die.

          • Re: (Score:3, Funny)

            The question is: Did you use soap?

            Ah.... and olde tyme geek.

            Times have changed. You don't need a long beard, poor hygiene and smelly clothes to be take seriously these days :)

        • Re: (Score:3, Funny)

          by scorp1us ( 235526 )

          Washing machines are pretty harsh places. You get tidal forces that will apply various physical stresses to the components.

          I don't use Tide. I use Gain. Perhaps I have it up to high?

      • Re:In my case (Score:5, Informative)

        by Moryath ( 553296 ) on Friday April 10, 2009 @09:27AM (#27531267)

        Corrosion.

        Being repowered while the internal circuit board is still damp with soap-contaminated water (shorting).

        Physical stress ("agitate" cycle, "spin" cycle, Tumble Dry...).

        Heat stress (which heat cycle did you use/did it go through the dryer too).

        Need I go on?

      • Re:In my case (Score:4, Insightful)

        by ptomblin ( 1378 ) <ptomblin@xcski.com> on Friday April 10, 2009 @09:28AM (#27531277) Homepage Journal

        Usually the case falls apart. I can still get the data off the drive, but I stop using it and just spend another $20 to get something with 8 times the capacity of the last time.

      • I'm with scorp1us here. Being an idiot, I have sent my flash drives through not only the washer but also the dryer multiple times, and have never had an issue (thank god).
    • Re: (Score:2, Interesting)

      by Anonymous Coward

      Mine is a beast. I washed it at least twice, still worked, and then most recently ran over it twice. Once backing up, and again coming back up the driveway when i 'forgot' it inside. Not realizing i had dropped it. I found it when i got home and it was crushed. Removed the metal around the USB connector since it was a pancake, plugged it in while holding it and the dang thing still worked. However since i'm lazy i don't want to hold it in forever so it's been retired.

    • Is that a washing machine in your pants, or are you just happy to see me?
    • by Lumpy ( 12016 )

      Have you found that number?

      I'm up to 6 on my 16 gig cruzer.

      Also I found my missing 4gig that I lost in november. the snow melted enough that it appeared in the snowbank at home. A good rinsing in deionized water and it still works.

  • He'd taken it out of his camera, tried to put it back in, and nothin'. Slapped it into my Linux box. It "saw" that there was a device there, but wasn't real happy about it:
    [ 5555.618324] sd 4:0:0:0: [sdb] Add. Sense: No additional sense information
    [ 5558.777567] sd 4:0:0:0: [sdb] Sense Key : No Sense [current]

    "It's dead, Jim."

    I'm tempted to try the old hard-drive swaparoo: get the exact same SD card, unsolder the flash chips, and put the bad one's flash on the new one's circuitry. See if it's the circuitry that's bad, or the flash, itself. If anyone has any bright ideas on how to determine definitively which it is without me going through that exercise, I'm all ears.

    • Re: (Score:3, Interesting)

      by grahamsz ( 150076 )

      Firstly, "getting the exact same SD card" might be a challenge. I've bought various cards from the same manufacturers and they tend to have subtle variations.

      Secondly I believe there isn't really much on an SD card except for the flash chip. CF cards have more of a traditional controller on there. A lot of the early criticism of SD was that a poorly made reader could screw up your card.

    • by jweller ( 926629 )

      open it up first, a lot of them are just a solid chip now, not flash soldered to a board.

  • Burnt out (Score:3, Informative)

    by abigsmurf ( 919188 ) on Friday April 10, 2009 @09:14AM (#27531045)

    From what I gather, the most common cause of failure is the flash getting fried. Dodgy card readers, pulling the card out when a voltage is running through it, the chips are very sensative to spikes in current or voltage and burn out because of it.

    • by Samschnooks ( 1415697 ) on Friday April 10, 2009 @09:32AM (#27531351)

      Dodgy card readers,...

      That's what you get for buying a Chrysler product or any Detroit product. Try getting a Honda or Toyota card reader. Or if you're a yuppy, a BMW card reader. Although, no one holds a candle to the Japanese.

    • Re:Burnt out (Score:5, Informative)

      by dasunst3r ( 947970 ) on Friday April 10, 2009 @09:38AM (#27531447) Homepage

      I am currently taking a class on solid state devices, and we just talked about how MOSFETs would fail. Basically, a high voltage to the gate would create these electrons that have so much kinetic energy that they create pairs of opposing charges (electron-hole pairs) in what was supposed to be the insulator. These pairs of charges would create an internal electric field inside the insulator. This process reduces the barrier for tunneling to occur, so more electrons are able to tunnel through the insulator and do the same thing, creating a runaway effect.

      For more information, look up "Time-Dependent Dielectric Breakdown" and refer to pages 293 and 294 of Streetman and Banerjee's "Solid State Electronic Devices" (6th ed).

      • is packaging. There is stuff in the potting epoxy that holds enough electric charge to make the FET's gate start to conduct a little, playing havoc with everything. We've been having to redo parts with an extra layer of metal over the top of the IC to protect it from an intermittent contamination in our packaging material.
        I believe I remember reading that Intel had problems with their water being mildly radioactive downstream of an old uranium mine, and running into the same problems (only much worse, sin

    • Re:Burnt out (Score:5, Interesting)

      by BitZtream ( 692029 ) on Friday April 10, 2009 @10:39AM (#27532359)

      Seen this failure mode a lot too. Static build up on your body, then when you go to insert the device the charge jumps between you, through the device to the grounded casing around the USB connector.

      Can do anything from reboot your PC (if you're lucky) to destroying the stick or the USB controller on the PC (or HUB if you're luck).

      As you said power is a major problem with USB. Cheap USB sticks need FULL power to work right. Often times we'll have a customer with a stick that works fine in one PC (at home or work for instance) and will either not be recognized or will give read/write errors in another. Most of the time this is solved by using an external powered USB hub as the mother board simply isn't supplying enough voltage or current to power the stick. I'm not really sure if in general the problem is the motherboard or the stick as I haven't bothered to pull out the multimeter and do any serious testing, but I'm inclined to think its the stick as it seems to happen mostly with cheap/noname sticks that were probably rejected by the likes of Sandisk and co.

      As far as pulling them out while the card is powered, that is part of the specification for SD and USB, not sure about compact flash, but I would assume its there as well. USB and SD have the connector configured in such a way to ensure power is applied and removed in the proper order, which is why their connectors have some contacts that are longer than the others.

      What you said is still true however, a cheap chip on either side may not handle that process well. I can say however that we have successfully ran 3.3v SD cards at 7 to 9 volts for short periods of time due to mis configured testing setups where we didn't check the voltage after switching modes. Of course, we've also lost more than a few SD cards for that very reason, even at 5 volts they won't last more than a few minutes. mini and micro cards in an adapter to full SD fair better generally as the mini and micro's work at around 1.8v (I think, memory is fuzzy about that atm, might be 2.7) and have internal voltage dividers to cut down the 3.3 v input from the system, the still fail eventually due to over voltage, they just seem to do better although I have only anecdotal evidence to support that.

      • Re: (Score:3, Informative)

        by jazzkat ( 901547 )
        You'd have to have a hell of a lot of built-up voltage to jump through the plastic casing, through the air gap to the non-grounded metal on the PC board, and then from there across the air gap to the USB grounding shield. USB grounding is rugged as hell. At one point, the outlet behind my computer desk did not have a plate. One day when I was re-arranging cables, the metal shield of a USB plug brushed one of the screws for the 120v hot side in the outlet. The 120v had a clear path thru the USB cable, in
  • If a cell fails, you can't read or write that cell.

    If a gate fails in a page, you lose access to the page.

    If a gate fails in the overall control logic, you lose access to the whole device.

    Is there something I'm missing? Did you think there were oil changes or brake shoes? It's one silicon chip with metal on it.

    • If a cell fails, you can't read or write that cell.

      If a gate fails in a page, you lose access to the page.

      If a gate fails in the overall control logic, you lose access to the whole device.

      Is there something I'm missing? Did you think there were oil changes or brake shoes? It's one silicon chip with metal on it.

      What about redundancy and self-healing? How do those work?

    • by Vellmont ( 569020 ) on Friday April 10, 2009 @09:48AM (#27531611) Homepage


      Is there something I'm missing?

      Maybe the part where you assume everyone knows the above?

      Or how about the part where the submitter is asking about typical failure modes, not all possible failure modes?

    • by scatterbrained ( 144748 ) on Friday April 10, 2009 @09:49AM (#27531629) Journal

      If a cell fails, you can't read or write that cell.

      If a gate fails in a page, you lose access to the page.

      If a gate fails in the overall control logic, you lose access to the whole device.

      Is there something I'm missing? Did you think there were oil changes or brake shoes? It's one silicon chip with metal on it.

      Conceptually at least, there are several parts to worry about:

      1 - the OS & storage driver
      2 - the USB driver
      3 - the flash controller
      4 - the flash memory

      At the flash memory cell level the usual failures are breakdown of the dielectric materials and trapping charges in the memory cell that prevent an erase from happening and yield 'stuck' cells. This is normal for /all/ flash chips and is why they all have an erase cycle rating. There are certainly more exceptional ways for the chips to fail (soldering, wire bond failure, static damage, etc).

      The flash controller is supposed to be doing wear leveling, error detection and correction on the flash, to get around those problems with the flash chips, and also talking USB. These chips usually have a microcontroller in them somewhere, and there's probably bugs in that code, no doubt more in the parts that get exercised the least, like error paths :-)

      The OS and drivers just have the garden variety bugs and features that we all know and love...

    • Re: (Score:3, Informative)

      by AvitarX ( 172628 )

      Having a broken SD card in my pocket, I will describe how it behaves (which I think is what the article is asking). It is a 1GB SVP.

      In Windows (XP and Vista), it asks me to format the drive, chkdsk fails because the partition type is raw. Using recovermyphotos on it I get between 10 and 200 photos found before the card reader decides it is not in their anymore, and I can't recover the ones found (perhaps if I paid I could recover as it scanned).

      On Linux cat /dev/sdb returns no media found (I assume this i

      • Re: (Score:3, Interesting)

        by Intron ( 870560 )
        You might be the victim of one of the crooks who reprogram controllers in smaller flash to report as larger. It works fine until you wrap around and overwrite your file system. Beware of great deals on eBay.
    • Is there something I'm missing?

      Yes. You're theory does a good job, but you need to balance it out with the potential for "intelligent failure". That is, the failures are simply to complicated to explain without some kind of intelligent force at work.

    • by dargaud ( 518470 ) <slashdot2@gd a r gaud.net> on Friday April 10, 2009 @10:48AM (#27532483) Homepage
      There may be other manners of failure. I have a recent 2Gb USB thumb drive that started going ever more slowly after a few days of use. I last measured a "dd if=/dev/random of=/media/device/test" of no more than 0.5kB/s. If somebody wants to have some fun analyzing it, I can put it in an envelope free of charge.
    • by James McP ( 3700 ) on Friday April 10, 2009 @10:53AM (#27532549)

      If a cell fails, you can't read or write that cell.

      This is a silent failure, much like hard drives marking blocks as bad. Capacity is reduced without any obvious signs. Not sure if OS tools can recognize it unless the controller reports bad cells as bad blocks. This will eventually result in "disk full" messages when there appears to be space on the drive. Reformatting won't recover the cells but it will likely result in your OS being aware of the flash's reduced capacity.

      If a gate fails in a page, you lose access to the page.

      Very similar to above, but larger amounts of data. I want to say there's 64 cells to the page but don't take that as gospel.

      If a gate fails in the overall control logic, you lose access to the whole device.

      Hello failed/unreadable/size 0 disk error. The data storage mechanism is intact but there's no way to access them. As people stated above, a lot of the time it is not the failure of a transistor so much as a trace or solder point failing. If you know your device has been abused physically, you can try the low-tech approach of gently squeezing or bending the stick while it's in the USB port (use an extension cable so you don't damage your mobo!!) to try and get the contacts to reconnect long enough to retrieve data. If that fails you can pop the case apart and use a magnifying glass to look for breaks in the solder or traces; if you're handy with a soldering iron you can try to bridge the connection. Again, temporary fix.

      Is there something I'm missing? Did you think there were oil changes or brake shoes? It's one silicon chip with metal on it.

      Actually most of them are several silicon chips; one controller plus a variable amount of memory chips. The increase in traces and board assembly is offset by the ability to reuse components and the overall design while memory chip prices fall. It also cuts down on the impact of failed chips, since you aren't losing controller+memory for one bad gate on the controller.

      • by dgatwood ( 11270 ) on Friday April 10, 2009 @12:08PM (#27533561) Homepage Journal

        Yes and no. A page or cell failure will result in I/O errors if there are no more spares, and if it occurs during a read cycle, it -should- result in I/O errors for all subsequent reads from that cell or page until it gets rewritten to a new cell or page. If it doesn't work that way, then the device is fundamentally violating the contract between the device and the OS to report all nonrecoverable errors that result in data loss.

        Also, while a multi-chip design reduces the probability of a device failing outright, it dramatically increases the probability of a failure. First, using a separate controller significantly increases the probability of failure because instead of having interconnect traces on a slab of silicon that (electromigration notwithstanding) almost never change or fail if they work from the factory, you have solder joints exposed on a circuit board. Solder joints are the most common cause of circuit failure in my experience.

        Even ignoring the increased risk of having extra solder joints between the controller and flash parts, the odds of failure are still much worse for multi-chip devices. Remember your RAID MTBF theory. The MTBF of a collection of devices is equal to the MTBF of one device divided by the number of devices. If you have one part, the MTBF on that slab of silicon and associated solder joints might be a year. If you have five parts, the MTBF is now 73 days. That's an extreme example, but sadly, I've seen flash sticks with large numbers of failures in the first month, so that's not nearly as gross an exaggeration as you might think.... And whether one part fails or the whole thing fails, you still lose data.

        Also, a controller failure is still likely to cause all flash parts to be inaccessible whether it is integrated into a flash chip or is driving eight discrete flash chips. It's not like you're going to use a separate flash controller per flash part. And I -think- that a device showing zero capacity is probably caused by the flash controller being unable to communicate with the flash parts. If so, then that is much more likely to be caused by a failed connection between the two than by a failed flash controller (unless there are problems with interconnects inside the flash controller chip package failing due to overzealous compliance with ROHS rules).

        The original poster also failed to mention the most common failure mode, bar none: poor solder joints or other physical interconnects getting broken by physical force. This is very common among cheap flash drives. I wouldn't expect the same with SSDs, of course---you don't normally carry a SSD in your pocket---but at least in my experience, this one cause of failure is easily an order of magnitude more frequent than any other single cause, and is in all likelihood greater than all the others put together. And that's not even counting actual abuse (washing machines, run over by cars, and so on).

        My Lexar JumpDrive Secure flash drive suddenly stopped working, and I talked to my mother, whose entire university class was using that same model of drive. Turns out that between us, we had experienced close to a 50% failure rate on those things within the first month or so, having seen somewhere around 14 or 15 failures. The failure was interesting. Mine failed suddenly, but worked if you tipped the connector at an angle... at least for a couple of seconds once or twice. This told me pretty conclusively that the failure was caused by poor hardware design. As best I can tell, when you carry the drive in your pocket, the cap puts pressure on the USB connector. Over time, this gradually causes solder joint or trace failure (I never cut one open to figure out which) at or near the USB connector.

        Since then, I only buy flash devices with mechanisms where the USB connector retracts into a solid housing. Sure, you have an elevated risk of gunk from your pocket getting into the connector because it isn't covered, but at least you don't have the flexing problem. Gunk can be cleaned with a flat toothpick and alcohol. Failed solder joints requires disassembly and SMT soldering skills.... :-)

        • Re: (Score:3, Interesting)

          by James McP ( 3700 )

          Also, while a multi-chip design reduces the probability of a device failing outright, it dramatically increases the probability of a failure.

          I didn't make it clear that I was referring to the manufacturing side of things. I meant multiple chips reduced the chance of failures during manufacturing making the whole product unsellable. The more transistors to the package, the greater chance that some of them will be bad off the line. If the package can't tolerate any transistor failures and the cost per fai

  • Failure to Write (Score:5, Informative)

    by Toad-san ( 64810 ) on Friday April 10, 2009 @09:15AM (#27531073)

    Had two finally wear out. Both started giving "could not write to device" sort of errors. The system (Windows 2K or XP) would still recognize the drive, would show the files, etc. Indeed, I could still access (read) the files, so the data was there and copyable. But I'd get a file write error every time I read anything, because Windows was trying to update the flash drive's file directory with "last accessed" or some such, and that write would fail.

    No biggie; copied the data to a replacement, threw the old ones away, after hitting them several times with a hammer to "clear" the memory :-)

    • by grahamsz ( 150076 ) on Friday April 10, 2009 @09:31AM (#27531333) Homepage Journal

      On a modern filesystem, your writes should essentially be atomic and in theory it shouldn't be possible to leave the drive in an inconsistent state when the write fails.

      Of course most camera memory cards end up being formatted with fat32 which can be a little less forgiving.

      • by Hatta ( 162192 )

        On a modern filesystem, your writes should essentially be atomic and in theory it shouldn't be possible to leave the drive in an inconsistent state when the write fails.

        But when "consistent" means all your files are zeroed [launchpad.net], that's not much consolation.

      • I wonder how well that hammer thing worked.

        It's beautifully effective for drives because they are high speed high precision mechanical devices, but even if you broke up the circuit board the chips were soldered to a guy with a soldering iron and some know how might still be able to get it back together again. Looking at that cell to gate progression posted earlier it sounds like unless you are able to actually destroy a given gate you don't destroy access to a give chip. If you were able to access the inte

    • Re: (Score:3, Interesting)

      by bkaul ( 1235970 )

      I had a 2 GB Micro-SD card in my phone fail on me; it also failed to write, but there was also data corruption of some of the contents that were already on the card.

      The first symptom I encountered was that my backup program would report that it had failed to successfully back up the phone to the card. I popped the card out of the phone and into a PC, and noticed the data corruption in several places when trying to back up the contents - not just CRC read errors, but filenames actually turned to garbage, et

    • Quiet failure... (Score:5, Informative)

      by NotQuiteReal ( 608241 ) on Friday April 10, 2009 @10:14AM (#27531989) Journal
      I too had a flash drive fail, but in the "worst" way... quietly.

      Fortunately, the drive was mostly used for "sneaker net" use, and did not contain any irreplaceable data. This use exposed the issue quickly too (had it been a backup device, the backup would have been useless and I wouldn't know until I needed it.)

      A typical failure was to zip up a software installation on a dev machine, then take it to a clean target machine, where the zip would fail to unpack, or the installer exe, once unpacked, would fail to run with various errors.

      I finally got to the point where I simply copied several megabytes of plain text data to the memory key, then copied it back and diffed the files to see the corruption (large areas of nulls, as I recall.)

      Never heard a peep from the OS.

      It was a 1 1/2 year old Patriot XT 2GB, and, after a couple of emails and a PDF of my NewEgg receipt, a new drive showed up in the mail under the lifetime warranty.

      I also had an expensive Lexar CF card for a digital SLR that failed. In that case pictures that I know I took simply weren't on the card... but could be "recovered" with the Lexar utility (along with EVERYTHING else on the card, so it was a PITA.) Since that was nearly $200 when it was new, I figured getting my lifetime warranty honored would be easy, since the cards were down to about $20. No dice. Just got the run-around and finally gave up. Lexar lost a customer.
      • Re: (Score:3, Funny)

        by houstonbofh ( 602064 )

        I also had an expensive Lexar CF card for a digital SLR that failed. In that case pictures that I know I took simply weren't on the card... but could be "recovered" with the Lexar utility (along with EVERYTHING else on the card, so it was a PITA.) Since that was nearly $200 when it was new, I figured getting my lifetime warranty honored would be easy, since the cards were down to about $20. No dice. Just got the run-around and finally gave up. Lexar lost a customer.

        They lost more than one... They are now in the same group as Maxtor, politicians, and strippers...

  • Fail on write (Score:5, Insightful)

    by fishybell ( 516991 ) <fishybell@@@hotmail...com> on Friday April 10, 2009 @09:15AM (#27531077) Homepage Journal
    The biggest difference I've encountered is when traditional hard drives fail, they fail on reading data back.

    Flash media fails when you write the data. In theory this means that you can always recover data as you can never write data to bad sectors. In practice the entire media device (CF, SD, etc.) fails at once.

    • Re:Fail on write (Score:4, Informative)

      by SatanicPuppy ( 611928 ) * <Satanicpuppy@gmai[ ]om ['l.c' in gap]> on Friday April 10, 2009 @09:32AM (#27531367) Journal

      It just seems like the traditional drives only fail on reads: they mostly do reads, so when they fail, it's more likely on a read.

      I've had many a drive fail during writes though, usually at the worst possible time (deadlines, when the machines are getting read/write hammered, and then bam, drive goes down and RAID performace goes to shit, and people start whinging.)

      I've had flash drives die all at once. It's not the norm, but there are things that can happen that will take them from "fine" to "dead" with no steps in between. Usually it's thumbdrives that that happens with; I haven't had a full flash harddrive fail at all yet, so I don't have any insight there.

  • Flashmemory (Score:4, Informative)

    by Narpak ( 961733 ) on Friday April 10, 2009 @09:17AM (#27531107)
    Maybe I am totally on the wrong track here but don't the fact that they can't use Lead in some of the alloys contribute to the lifespan of some computer parts?

    As I understand it aluminium alloys created without lead and then used in computers degenerate several magnitudes quicker than alloys with lead. The process is apparently that the aluminium start sprouting tiny tiny "hairs" and when one of these connects to another one of these coming from somewhere else in the machine then it's thank you and good night for that part.

    Anyway the reason I mentioned this is because apparently with intensive use 5-7 years is how long parts in your computer takes to make a connection and after that it is LED OFF (see what I did there?) Of course unless you have a computer constructed before the mid nineties (I think that was the point); since they use lead in their alloys this isn't something that will affect them (though a range of other issues will).
  • FAT (Score:5, Informative)

    by AKAImBatman ( 238306 ) * <akaimbatman@ g m a i l . com> on Friday April 10, 2009 @09:17AM (#27531113) Homepage Journal

    The one account I have found detailed using a small USB drive for /var/log storage; it failed very quickly, and then utterly (0 byte unformatted device), after five years of service in the role.

    Without knowing more about this specific situation, I'd say this failure sounds like it pre-dates wear leveling. Prior to wear leveling, the most used sectors were likely to fail the fastest. And what sector gets written to more than the file allocation table?

    If the file allocation table was lost, that would explain why the device became completely inaccessible. The card might not be a total loss if the card contains firmware or circuitry to remove bad blocks from usage. In that case it might be possible to reformat it. (Of course, if it lacks wear leveling I wouldn't count on it.)

    Wear leveling neatly solves this issue by shifting writes to different free blocks with every write. This assures that the maximum use of the card is obtained prior to failure. Should any given block fail the card will detect the checksum error, mark the block as bad, then attempt to rewrite to a different block. This is communicated back to the reader in a transparent way. As far as the reader knows, nothing happened.

    As you can imagine, wear leveling makes it incredibly rare to see Flash failures these days. It can still happen, but the results are likely to be unpredictable. The card will need to chew through all free blocks before it starts returning errors. In that case you may be able to continue reading the media. Or it may fail like the USB drive you mentioned. It all depends on the importance of the block on which the erasure was attempted. Since you only know about a failure *after* the block erasure, you're at the mercy of the quality of the card's electronics and algorithms to protect against a dangerous erasure.

    • Re:FAT (Score:5, Informative)

      by daid303 ( 843777 ) on Friday April 10, 2009 @09:29AM (#27531281)
      Even with wear leveling devices still can fail easy. A single power failure during a write can ruin a perfectly good SD card. It took me a single try.

      Most devices that do hardware wear leveling are not power fail safe. And get corrupted beyond repair, random data corruption may follow, or an unreadable device.
      (I've done extensive testing with SD and Compact Flash devices in power fail cases. Because not all manufactures deliver what they promise)
      • Re: (Score:3, Insightful)

        by AKAImBatman ( 238306 ) *

        A single power failure during a write can ruin a perfectly good SD card. It took me a single try.

        You're right, I think that's the most common situation people see these days. Most of the other posters are describing sudden, total failures. Which are consistent with frying the drive rather than failures of bad blocks. Not all that different than losing a head on a hard drive.

    • According to this Linux filesystem developer, wear leveling as implemented in consumer level flash memory is often pretty lousy: http://valhenson.livejournal.com/25228.html [livejournal.com]
  • like a CPU (Score:3, Informative)

    by Lord Ender ( 156273 ) on Friday April 10, 2009 @09:20AM (#27531145) Homepage

    I've been booting linux servers off of flash for a few years. For some of them, the whole OS, even /var/log, is on the flash drive.

    I've had one drive fail, and it basically got hot and stopped being recognized as being connected by the computer. It was older generation technology, though. Newer flash technology designed for computers doesn't fail, as far as I have experienced. I'm talking about the flash SATA drives from name-brand manufactures.

  • Flash mail server (Score:4, Informative)

    by ace123 ( 758107 ) on Friday April 10, 2009 @09:21AM (#27531167) Homepage

    I had a 4GB FAT32 flash drive that I used as storage for a mail server attached to an OpenWRT router. It required renaming and deleting files all the time (every time it got an e-mail)--so I think it wore down pretty quickly.

    One day, the storage for the flash drive stopped working (from one hour to the next, without being touched, the computer acted like I had just yanked the drive out)--it would be recognized but report a "no media in drive" error when you tried to access it, like an empty CD drive. In fact I think Windows would say "Insert CD" or "No disc in drive F"

    • Re: (Score:3, Informative)

      by ranulf ( 182665 )
      Similar experience for me. I was running a slug (basically NAS device with network and 2 USB ports) as a general server using a USB memory stick.

      After about 6 months of fairly heavy use (with only 32Mb RAM I needed to swap to flash), one day the USB flash drive just stopped working, and it's no longer even detected when I plug it into any system now.

      I'd done all the obvious things such as mounting with noatime and have the swapiness to 0, but ultimately discovered that flash really doesn't like being co

  • Is it possible the first part the OS looks at, with the index of everything on the drive has failed and shows nothing when in fact the data is there. Not unlike a dual booting Windows overwriting a previous Linux MBR and "forgetting" to add the already installed Linux to the list of boot options. Linux is still there although there is nothing in the first part pointing to it. I dunno how flash works at this level so it may be bullshit, but I thought I'd throw it out there; you never know.
  • by spyrochaete ( 707033 ) on Friday April 10, 2009 @09:24AM (#27531211) Homepage Journal

    A few weeks ago /. linked to a really wonderfully written article by Anand Lal Shimpi about SSD drives. In the article he includes some simple and clear explanations of how flash memory works, its lifespan, and how it handles writes and deletes to maximize the life of every block of storage.

    http://www.anandtech.com/printarticle.aspx?i=3531 [anandtech.com]

    The only think missing from the article is a description of the behaviour of a failing drive.

    • by dzfoo ( 772245 )

      >> The only think missing from the article is a description of the behaviour of a failing drive.

      So what you actually meant is that "Anandtech 'splains it almost all".

      An article that explains everything except what the original poster asked is not very relevant, is it.

            -dZ.

  • If the flash drive fails, yes you can continue to read from it, but you also have to consider what is meant by reading.

    You can always read the raw data from the device, that will never change. There is nothing that prevents the electrical signals from forming a proper read transaction on the IO pins of the flash IC chip.

    However, when you consider the software that is on top of the raw data (a file system for example), this is where you will have the trouble.

    With older CF cards, the concept of wear
  • gracefully... (Score:2, Informative)

    by bdewet ( 546467 )
    I had flash failing on my 'gracefully'. The amount of available storage just becomes fewer and fewer after usage. It seems like the cells(if one can call it that) just dies after repetitive usage. Formatting does not help either.
  • CF (Score:5, Informative)

    by psergiu ( 67614 ) on Friday April 10, 2009 @09:29AM (#27531279)
    Some years ago i used a 64Mb CF to install a minimal Debian on a IBM PC110 with 8Mb of ram. As the install process wanted more memory i created a 12Mb swap partition.
    Big mistake.
    The install took a whole day. I happily ran some programs the next day and crash - kernel screams of i/o errors in the swap partition.
    Formated the card MS-DOS - it found a few bad sectors. Then i ran Norton Disk Doctor and at every run it was founding more and more bad sectors. But each time i was re-formating the card using a camera, the bad sectors were shifting around. Unusable.

    FYI: IBM PC110 is a 486 Palmtop with a CF slot to be used as hard-drive. The CF interface is IDE.
    • Sounds like a defect in the wear leveling system. Not that I really understand how wear leveling is done in practice. I understand the idea of trying to spread the writes over the whole device, I just don't know how they actually keep track of that block mapping.

      • Re: (Score:3, Interesting)

        by AKAImBatman ( 238306 ) *

        10MB CF cards predated the common deployment of wear leveling. Those old cards could fail at the drop of a hat. Especially if anyone was foolish enough to use them in a high-volume write situation.

  • The wear-leveling concept would certainly work to favor a long, normal operational lifetime punctuated by an epic fail.

    I would expect corruption of blocks - some take the new values, others don't. There is also the concept of t he bad-block list which might work well enough to begin shrinking the available blocks, possibly to zero, as the one failure you mentioned described.

  • The short answer... (Score:3, Informative)

    by earnest murderer ( 888716 ) on Friday April 10, 2009 @09:31AM (#27531331)

    Your flash memory is fine, the controller is hosed.

    This kind of (essentially unrecoverable) failure will continue to be an issue wherever the logic is integrated with the storage.

    If it's any consolation, except for those who are always forgetting to "eject" or turn off their device before removing the media this kind of failure should be quite rare*.

    Enjoy.

    *Mfr's producing shoddy products not withstanding.

    • > If it's any consolation, except for those who are always
      > forgetting to "eject" or turn off their device before removing the media

      Out of idle curiosity, you don't know why it is that when I eject my media the whole card reader gets turned off, do ya?

      Stupid thing makes me mental, if I eject the card, I have to unplug the card reader and use the card. I've given up and just yank willy-nilly. So far the media seems to be holding up (probably about 1,000 inserts on this memory stick duo...)

  • I found an old 10MB CF card tidying some boxes the other day. Plugged it in and it said device (or disk) not formatted.

    As to how old it was, it came bundled with the Kodak DC120 I bought on promo when that model was superceded.

    I wonder what was on it - at 10MB, probably not much!

  • flash faliure (Score:5, Interesting)

    by erbbysam ( 964606 ) on Friday April 10, 2009 @09:38AM (#27531437) Homepage
    About 5-6 years ago, I decided that it would be a good idea to build a small application on a flash drive, that is, code and compile it directly to the drive.
    After what must have been hitting compile a few hundred to a thousand times, the 128MB thumb drive starting giving me drive write errors and then stopped responding altogether within about a minute after errors starting appearing.
    I think the moral of this story is backup your data, even when it's on a flash based drive, and don't code directly on a cheap thumb drive :)
    • Re:flash faliure (Score:4, Insightful)

      by clone53421 ( 1310749 ) on Friday April 10, 2009 @10:44AM (#27532429) Journal

      Actually, the rule of thumb is:

      backup your data, ESPECIALLY when it's on a flash based drive

    • Re: (Score:3, Insightful)

      I think the moral of this story is backup your data, even when it's on a flash based drive, and don't code directly on a cheap thumb drive :)

      Yup, this is important, but then again this important because for me the single biggest cause for data loss related to thumb-drives is: loss of drive.

      I would like to say that I am very careful with my drives, but the truth is the loop holding the drive to the key chain is usually very weak. There is also the person is in question which has something to do with it, but

  • I have a cheap 4GB SDHC card. I've used it only a few times to take photos. Sometimes if it's in my camera, the camera gives an error that there's no card in it. After removing it and putting it back, it works again. And if I put it in the card reader in my PC, same thing: Sometimes mounting it in Linux works, sometimes it doesn't and it's as if nothing is in it. Removing it from the reader and inserting it back may make it work again. Could this be due to bad copper contacts on the SD card?
  • I was given a 4GB SDHC card by a friend, frantic that all her photos had disappeared. She did not do anything do physically damage the card, it was sitting in her camera and just suddenly started showing 0 photos one day when she turned it on.

    I popped it into my linux machine and started to dd all the data I could get off of it. The first 512MB were fine. The next 512MB were completely unreadable. The last 3GB were fine.

    Not sure exactly what could cause this type of partial failure, but it certainly seem

  • by blind biker ( 1066130 ) on Friday April 10, 2009 @09:40AM (#27531485) Journal

    ...and quality and longevity take a back seat. So companies stopped offering SLC Flash RAM (+100.000 writes) and only offer MLC (5000 writes), and are now pushing even eight-level MLC, which will be even less reliable than standard 4-level MLC Flash RAM. But who cares, the consumer will be slightly fucked after a while, but that will be much later, after they enjoyed the happiness of getting slightly more GB for their buck.

    The only manufacturer that I know of, that is an exception, if Kingston, which still offers SLC Flash products - namely their elite pro line of SD and CF cards, and the Data traveler USB drives. But that's it, everyone else has not completely transitioned to MLC.

  • by account_deleted ( 4530225 ) on Friday April 10, 2009 @09:43AM (#27531547)
    Comment removed based on user account deletion
  • I have managed to wash and dry my flash drive numerous times and it still works. I make sure I have a backup of any important data on there of course, but I have been pretty impressed with how durable these flash drives have gotten.

  • FAT Failure (Score:3, Insightful)

    by ArcherB ( 796902 ) on Friday April 10, 2009 @09:52AM (#27531691) Journal

    When I was in the digital imaging kiosk business, we had to repair about three flash drives a week. A customer would put it in one of our systems and pull it out while it was being read, or it was a cheap drive or whatever. Either way, the customer would blame our systems for killing their drives (rightly or wrongly). Of course, it would contain pictures of their dead grandfather or ex-girlfriend naked or whatever was completely priceless and irreplaceable.

    The vast majority of the time, we would be able to run an application that would be able to recover whatever was on the drive. While I'm not certain of the original problem, the system acted as if the drive had no FAT (File Allocation Table... do I really need to say it?) on it or the FAT had become corrupted. This particular application would be able to go in and recover whatever was on the drive and most of the time repair the drive to its previous working state.

    I say it ACTED like the FAT was corrupt, but I don't know or care if a flash drive has a FAT on it. Could have been a hardware thingie in there that hiccuped. The repair utility acted much like a scan-disk that would repair an MBR or FAT and/or act like an undelete utility would, restoring the files on the drive.

  • Most removable 'flash' media probably dies from ESD and wear of the connectors. I've never knowingly lost one, but from the destroyed ones brought to me, it looks like static or possibly 'hot insertion'. 'Flash' is ideal for static file systems like /usr or /, etc. I use an image of the memory on hd and simply do something like dd if=/dev/image of=/dev/flash bs=16k. Systems that use inodes (just about all) are best and MS file-systems that use FATs are the worst. Even if you must use FAT file-systems, copyi

  • by spock_iii ( 1152403 ) on Friday April 10, 2009 @09:55AM (#27531729)
    For a prior employer, I had set up a process to qualify flash media for use in embedded products. There's a couple of different failure modes you are likely to see.

    First off, when the actual flash media itself wears out, it takes longer and longer to erase individual sectors.

    A flash device such as a USB stick or a CF card is slight more complicated because it has something known as an FTL (Flash Translation Layer). The FTL has the job of implementing the virtual media to flash sector translations, implementing wear leveling, and handling the awkward page erases. (Multiple sectors in a page, but you can only erase full pages.)

    The FTL obviously must store some mapping information in the media in addition to your data.

    If you start writing flash media, and time those writes, you see an initial rapid growth in the write timing that evetually levels off as the FTL tables swell to their constant operational size.

    The over all flash write speed will level off to some average value that follows slow growth over a very very long tail as the media wears.

    Early flash chips supported about 10,000 erases per page, and modern chips shipped by Samsung and others support a couple million erases per page. When you consider this is spread over say 4GB of media, you can understand that tail is very very long and flash media are probably comperable to hard drives in their MTBF these days.

    Secondly, when flash actually does begin to fail, the media itself tends to exhibit a small number of different symptoms.

    The flash may stat to show occasional data corruption when read. You might also have instances where data persists in the media only so long as power is applied. And then of course you have the fact that erases take longer and longer to achieve. Eventually erases or programming start timing out occasionaly.

    With the FTL between you and the flash, you don't directly observe these effects. Presumably the FTL is smart enough to try and re-map your data elsewhere. In most cases there's ECC to attempt correction of moderately corrupted data. The real killers are when the data fails to persist after power cycling, when ECC fails to recover critical FTL data tables, or when there are no more spare sectors to re-map data too.

    Those first two critical errors are likely to produce the lightbulb effect where your flash card or USB stick one day simply fails to come up when probed after device insertion. In more rare cases, the lack of spares may show up as some sort of reported write failure in your kernel logs assuming the flash device reports proper IDE/ATAPI/??? error data.

    One final note -- please don't leave your USB stick inserted in the PC as you power it off! USB ports supply power and use a FET device to control that power. When you turn off the PC, the gates float and significant leakage current goes to the USB device. Some of the cheaper USB drives lack a key resistor that bleads this current away and protects the flash memory chips. This leads to data corruption. I have seen the FTL break in such sticks simply by doing POR on the PC.

    Oh...almost forgot. When you put you flash stick through the washer and dryer, always use fabric softner or Bounce strips to reduce the static. :-)
  • N/T.

    But if you want a more detailed description, I'll acquiesce.

    I've had 3 flash drives fail.

    One failed because of cheap manufacture. The repeated use finally caused the solder to crack where the USB plug was mounted on the PCB. I was able to resurrect it with some careful soldering, but it eventually happened again, and I eventually wasn't able to get it working again. AFAIK, the actual device was fine, other than the loose plug. The body was made of cheap plastic, though, so it wasn't really a huge surpri

  • by Scorchio ( 177053 ) on Friday April 10, 2009 @10:12AM (#27531953)

    I have a Philips DVD drive with a usb port, and was using a 1GB flash drive to play back video files copied from my PC. The drive failed relatively quickly - I'd had it for about a year, but hadn't used it all that often. I started to notice the video files were corrupt on playback, but initially suspected the file itself, or possibly a problem with the DVD player's decoder. I diagnosed the problem by copying a file onto the drive, then repeatedly checksumming it. The first couple of times, the checksum value would be often be correct, then on subsequent checks it would change on me. I'd end up seeing several different checksum values, never seeing it return to a previous value. Whether this was due to a problem in the interface harware when reading, or memory cells failing to retain their state, I don't know.

    Even though it was a year old and I had no receipt, the manufacturer (Kingmax, I think?) was happy to send a free replacement. The new drive has seen much more use, but is still working fine.

  • by dannycim ( 442761 ) on Friday April 10, 2009 @10:25AM (#27532151)

    I've been running my home desktop/server (Linux 2.6) on a Sandisk Cruzer 8GB usb stick (root, swap, tmp, everything except large media files) for a year and four months without any glitches. I've napkin-calculated that at current usage and wear levelling, I should be able to use it for over 50 years without a failure. Funnily enough, the portable USB drive that I use to back it up failed last December. I keep multiple backups, I didn't flinch.

    Then again some flash devices fail miserably and silently. I've had a few 64MB and 128MB stick batches with stuck bits, and those were practically new. The operating systems they were used on didn't detect the errors, I did, by trying to open garbled files.

    My wish list: A SATA gizmo that has 4-5 USB connectors with each their own bus that presents itself to the SATA bus as a single drive, and does RAID-5 automatically. That'd be sweet.

If you didn't have to work so hard, you'd have more time to be depressed.

Working...