Ask Slashdot: How Do SSDs Die? 510
First time accepted submitter kfsone writes "I've experienced, first-hand, some of the ways in which spindle disks die, but either I've yet to see an SSD die or I'm not looking in the right places. Most of my admin-type friends have theories on how an SSD dies but admit none of them has actually seen commercial grade drives die or deteriorate. In particular, the failure process seems like it should be more clinical than spindle drives. If you have X many of the same SSD drive and none of them suffer manufacturing defects, if you repeat the same series of operations on them they should all die around the same time. If that's correct, then what happens to SSDs in RAID? Either all your drives will start to fail together or at some point, your drives will become out of sync in-terms of volume sizing. So, have you had to deliberately EOL corporate grade SSDs? Do they die with dignity or go out with a bang?"
Umm (Score:5, Insightful)
Re:Die! (Score:1, Insightful)
Re:Umm (Score:5, Insightful)
Which is why he's asking.
Fuck people who ask questions when they don't know something, right?
Re:Umm (Score:5, Insightful)
I've seen two instances where a drive failed. Each time there were no handy replacement drives. Within a week a second drive died the same way as the first! back to backup tapes! Better to have replacement drives in boxes waiting.
Re:Umm (Score:5, Insightful)
In general, if you get such an issue, it will happen early on in the life of the drives (one coworker had what he called the 30-day thrash rule - he would plan ahead and get a huge number of drives - the cheapest available meeting requirements, including avoiding manufacturers we had issues with previously, take a handleful, and thrash 'em for 30 days. If nothing bad happend, he'd either keep up 30 day thrashes on sets of hard drives, pulling out the duds, or just return the whole lot.
Re:Die! (Score:2, Insightful)
No offense intended but if your new why are you complaining about our long standing culture of cracking lame jokes, if you don't like it why did you join?
Re:Umm (Score:4, Insightful)
We see it all the time on a big array. One must hustle to repair/rebuild the RAID...
Re:Umm (Score:5, Insightful)
I've seen two instances where a drive failed. Each time there were no handy replacement drives. Within a week a second drive died the same way as the first! back to backup tapes! Better to have replacement drives in boxes waiting.
This. Your spares closet is your best friend in the enterprise. Ensure you keep it stocked.
And locked. And don't label them "spares". Label them "cold swap fallback device" or something that management won't see as something "extra" that can be "repurposed" (i.e. stolen)
Re:CRC Errors (Score:4, Insightful)
this is not very useful, as it mainly points out that the initial generations of commodity SSDs were immature. not to mention that return rates contain other phenomena than wear or even failure.
Re:CRC Errors (Score:5, Insightful)
I am running (6) OCZ Vertex2 256GB drives under heavy use 24/7. Almost 2 years on have only had one fail and it still works, just started kicking random errors.
Your failure rate of > 8% per year isn't very reassurring.
Re:CRC Errors (Score:5, Insightful)
I would counter-argue that any flash drive manufacturer is asking for massive RMAs when the device is clearly targeted for the laptop market (otherwise they would manufacture it in a 3.5" format) where the operating environment is guaranteed to be running on a battery for long periods of time. Any research in to battery operation would expose you to the vast differences in operating voltage as batteries discharge as well as the age of the battery. It is just bad engineering to not take this in to account.
Reformatting the drive was not an option because the drive wouldn't even detect in the BIOS unless the special factory jumper was set which is a non-operational mode for the drive. This problem was reproduced over 10 times with over 10 different drives of the same model Vertex. Slightly bad power caused the entire drive to be rendered unusable. Amazingly, none of the other hardware in the laptop had any problem with the power (i.e. screen, cpu, memory, other spindle-based hard drive, gpu, etc.). As I said, bad engineering.