How Power Failures Corrupt Flash SSD Data 204
An anonymous reader writes "Flash SSDs are non-volatile, right? So how could power failures screw with your data? Several ways, according to a ZDNet post that summarizes a paper (PDF) presented at last month's FAST 13 conference. Researchers from Ohio State and HP Labs researchers tested 15 SSDs using an automated power fault injection testbed and found that 13 lost data. 'Bit corruption hit 3 devices; 3 had shorn writes; 8 had serializability errors; one device lost 1/3 of its data; and 1 SSD bricked. The low-end hard drive had some unserializable writes, while the high-end drive had no power fault failures. The 2 SSDs that had no failures? Both were MLC 2012 model years with a mid-range ($1.17/GB) price.'"
build in some power storage (Score:5, Insightful)
Seriously... slap in some basic power circuitry and some caps - enough that the drive can finish the cycle it is on and do whatever it needs to do to power off safely.
Re:build in some power storage (Score:2, Insightful)
I'll quote the great CliffyB: Vote with your dollars!
What? It's valid thinking, not at all 9:th grade.
Re:build in some power storage (Score:5, Insightful)
space is at an extreme premium in those drives. There's a reason they feel so heavy/dense. Given the quilting layout of the chips, adding a single cap would prevent several memory chips from fitting. So you may as well then fill that remaining space with more caps. But you will reduce capacity, and that's what sells SSDs.
There's already a substantial amount of circuitry in them, far from "basic". It's essentially a CPU. I'd be interested to see some numbers as to average power drain during idle, read, and write.
The ones that did the best during the power blips probably did have caps and a bit more in their power system to handle it though. It certainly does surprise me that the mid-range, not the high-end, were the best performers in this test.
Unsurprising (Score:3, Insightful)
These devices have an elaborate internal database for the management of block remapping. For this to survive power failures it needs to use transactional updates. Getting this right is hard - it takes years for file systems and databases to become robust. I'd guess that many devices don't even attempt to do it and the ones that do probably have obscure failure modes. A UPS is essential.
Re:We encountered something like this (Score:5, Insightful)
Second, using FAT
Third, "commercial journaling FS". What does that even mean?
If you are industrial, where is your UPS?
not naming names = data "pulled out of my ass" (Score:3, Insightful)
Useless paper/test.
up/down/up/brown/fried (Score:2, Insightful)
What some of folks don't realize is its the seesaw nature of many power events that's primarily behind both data corruption and SSD failure. It's a rare rack system that has its own power conditioning and UPS these days (HP NonStop comes to mind) and without it you're subject to whatever the event provides in the way of under/over voltage, spikes, drops, etc. Many times these happen in timeframes too fast for power switching equipment to react and in some cases its that stuff that gets fried first.
Re:We encountered something like this (Score:5, Insightful)
I'm not saying their implementation was right, just saying that you can't imply from his post that it was wrong
My Personal Policy (Score:3, Insightful)
This is why I don't use prototype tech that is really not ready to be used in the real world. And if you do, expect loads of bugs and bricking.
But either way, thanks for funding the development of something I am excited to try out in 2-4+ years when it will be a mature usable technology.
Re:Finally somebody said it! (Score:2, Insightful)
Well thats probably becuase you were using OCZ crap. I have never had a quality product from that company.
However that said, I have noticed the same thing with the crucial m4s I have. In one particular laptop, it keeps bricking drives becuase the battery doesnt hold much of a charge any more. Luckily, i can "unbrick" them by plugging in the power (but not data) for 20 minutes, then plugging in the data connection, then rebooting the machine. Has worked more than once.
and crucial has put out a bunch of firmwares trying to deal with this. Last time it happened was a few months ago. I have aprox 15 other drives deployed and it only happens to one or two of them, seems to always be in laptops or after some sort of power surge. Crucial will always RMA the drive as well the one or two times i did not get it going.
And before anyone says "why thats why I dont use SSDs, too new and unstable!" I say that I would not give up my SSD for all the scsi 15ks in the world. SSDs are the single greatest speed increase in computer performance in the last 15 years. Make backups, as you should anyways, and dont be afraid of ssds. When you fly close to the sun, you are going to get burned. Still I would rather FLY so high and roll the dice on reliability (which is still stellar in most circumstances).
Rotational hard drives are such a pain now as an OS drive, and they still die eventually. I recommend SSDs to everyone now, with the caveat above that you always need good backups.
Re:We encountered something like this (Score:2, Insightful)
Obvious troll is obviously doing just that, i think his use of the term "silly faggots" when referring to linux users is the clue that tipped me off to this fact.