How Power Failures Corrupt Flash SSD Data 204
An anonymous reader writes "Flash SSDs are non-volatile, right? So how could power failures screw with your data? Several ways, according to a ZDNet post that summarizes a paper (PDF) presented at last month's FAST 13 conference. Researchers from Ohio State and HP Labs researchers tested 15 SSDs using an automated power fault injection testbed and found that 13 lost data. 'Bit corruption hit 3 devices; 3 had shorn writes; 8 had serializability errors; one device lost 1/3 of its data; and 1 SSD bricked. The low-end hard drive had some unserializable writes, while the high-end drive had no power fault failures. The 2 SSDs that had no failures? Both were MLC 2012 model years with a mid-range ($1.17/GB) price.'"
Before you ask. (Score:5, Informative)
The paper doesn't disclose the brands.
Finally somebody said it! (Score:5, Informative)
I had some original Vertex drives from OCZ that kept absolutely corrupting when my laptop got accidentally unplugged and I powered on the machine. I had to RMA them over and over and over again. I finally figured out that my battery was getting old and, although everything was functional even on battery power and it would boot, the initial large draw of power on boot must have created a voltage drop (i.e. brownout) which the SSDs weren't designed to compensate for. Within an hour of boot (even back on plugged power) they would choke, freeze the OS, and be rendered unusable from then on out.
Several SSD manufacturers are probably not engineering well for fluctuating power. Rather than fixing the problem with better engineering, OCZ simply changed their warranty policy to void the warranty if the customer is not providing proper power which, correct me if I'm wrong, I don't think rotating disk hard drive manufacturers have had that in their warranty clauses.
Re:We encountered something like this (Score:5, Informative)
Re:build in some power storage (Score:4, Informative)
I don't know what SSDs you've been using, but I've never picked up an SSD (OCZ Vertex 2/3, Intel X25-M/320/330/335/510/520) that didn't feel light and sound nearly hollow.
Consumer drives are usually lightweight, they don't need the extra cooling. Enterprise drives depending on who they're made by and what they're for can have heatspreaders or heatsinks within, or attached to each chip adding to the weight.
UPS does nothing for the common fault case. (Score:3, Informative)
Most enterprise SSDs do have small supercapacitors or capacitor arrays onboard for exactly this reason. Some of the higher-end consumer drives do too. But most consumer drives don't.
The answer? Get a UPS.
A UPS is no panacea: I experience grid failure very rarely.
However, relatively speaking I experience many more kernel lockups that require an ACPI-initiated poweroff by holding down the power button until the machine abruptly powers off. What do you do when a reboot/poweroff command causes your Linux/BSD machine to hang? I/O handle leaks in the Samba SMB client (ie. *not* the smbd daemon) and the Samba Winbind code are notorious for this. The only times I have ever had to "yank power" from a production Linux database machine were due to SMB share mount zombies or Winbind that the kernel couldn't kill even during an issued reboot command.
I have several OCZ Vertex 4 SSDs, and this concerns me—especially due to the fact that the paper/presentation does not disclose the test results. I guess I will just have to hope that my device models aren't affected and/or that waiting a minute or two during a hung poweroff/reboot means the kernel has stopped attempting to write to the devices and everything has flushed.
PS. If you compare the vague results in the summary with the paper you will find that only two of the fifteen drives passed the tests, yet four of the devices were cited to have power protection capacitors.
Re:We encountered something like this (Score:3, Informative)
If it was a drive being used to read schematics for CNC for instance, there isn't a manufacturer out there that currently offers a machine-tied UPS for the CNC machine. If the CNC machine loses power, then so does the drive, and vice versa, since it's all on the same circuit (usually you'll find the power stuff hidden in a cabinet along a nearby wall, and that stuff takes power directly from the mains).
Re: We encountered something like this (Score:2, Informative)
He is talking about the file system specification (its on disk structure) not about the specific code implementation in windows.