Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

How Power Failures Corrupt Flash SSD Data 204

Posted by Soulskill on Friday March 01, 2013 @06:06PM from the not-so-solid-state dept.

An anonymous reader writes "Flash SSDs are non-volatile, right? So how could power failures screw with your data? Several ways, according to a ZDNet post that summarizes a paper (PDF) presented at last month's FAST 13 conference. Researchers from Ohio State and HP Labs researchers tested 15 SSDs using an automated power fault injection testbed and found that 13 lost data. 'Bit corruption hit 3 devices; 3 had shorn writes; 8 had serializability errors; one device lost 1/3 of its data; and 1 SSD bricked. The low-end hard drive had some unserializable writes, while the high-end drive had no power fault failures. The 2 SSDs that had no failures? Both were MLC 2012 model years with a mid-range ($1.17/GB) price.'"

This discussion has been archived. No new comments can be posted.

How Power Failures Corrupt Flash SSD Data

Load All Comments

Search 204 Comments Log In/Create an Account

Comments Filter:

build in some power storage (Score:5, Insightful)

by X0563511 ( 793323 ) writes: on Friday March 01, 2013 @06:12PM (#43049765) Homepage Journal

Seriously... slap in some basic power circuitry and some caps - enough that the drive can finish the cycle it is on and do whatever it needs to do to power off safely.

Share
twitter facebook
- Re: (Score:2, Insightful)
  
  by Anonymous Coward writes:
  
  I'll quote the great CliffyB: Vote with your dollars!
  What? It's valid thinking, not at all 9:th grade.
  - We encountered something like this (Score:5, Interesting)
    
    by AliasMarlowe ( 1042386 ) writes: on Friday March 01, 2013 @06:41PM (#43050073) Journal
    
    We encountered extensive and progresssive file corruption on SSDs in an industrial device. It used the FAT file system, and after every loss of power, it ran its equivalent of chkdsk/f at the next boot. If power was lost again while this command was running, then it was guaranteed that the file system would become corrupt (despite the fact that we were writing nothing to the SSD; it held only files which were opened for reading). The window of opportunity was described as "very short", and the possibility of corruption was "very small" according to the vendor. In our experience in the field, and in our internal testing, the window of opportunity exceeded 20 seconds, and the possibility of corruption was "utter certainty".
    The vendor fixed the problem in a very easy way. They changed the file system from FAT to a commercial journaling FS. In our subsequent tests, we never found any file corruption, even on iterated power loss at random intervals after power on.
    
    Parent Share
    twitter facebook
    - Re:We encountered something like this (Score:5, Insightful)
      
      by TheRealMindChild ( 743925 ) writes: on Friday March 01, 2013 @06:52PM (#43050199) Homepage Journal
      
      First, running an SSD on an "industrial device"
      
      Second, using FAT
      
      Third, "commercial journaling FS". What does that even mean?
      
      If you are industrial, where is your UPS?
      
      Parent Share
      twitter facebook
      - Re:We encountered something like this (Score:5, Insightful)
        
        by yurtinus ( 1590157 ) writes: on Friday March 01, 2013 @07:13PM (#43050391)
        
        Likely as part of an embedded system - monitoring or control software. Systems where you just flip the power switch on when you need them and off when you're done, so an UPS wouldn't apply.
        
        I'm not saying their implementation was right, just saying that you can't imply from his post that it was wrong :P
        
        Parent Share
        twitter facebook
      - Re: (Score:3, Informative)
        
        by thejynxed ( 831517 ) writes:
        
        If it was a drive being used to read schematics for CNC for instance, there isn't a manufacturer out there that currently offers a machine-tied UPS for the CNC machine. If the CNC machine loses power, then so does the drive, and vice versa, since it's all on the same circuit (usually you'll find the power stuff hidden in a cabinet along a nearby wall, and that stuff takes power directly from the mains).
      - Re: (Score:3)
        
        by ultrasawblade ( 2105922 ) writes:
        
        You calling ext3/ext4 shitty? I can put the journal on a separate device for performance enhancement, can NTFS do that?
        In all serious though NTFS is well engineered.
    - Re:We encountered something like this (Score:5, Informative)
      
      by certsoft ( 442059 ) writes: on Friday March 01, 2013 @06:58PM (#43050289) Homepage
      
      We use USB flash drives for a data logger. Most of the time the data is being buffered in the ARM based Linux board's RAM to save power. Once we get a complete file's worth (4MB at the present) we power up, validate, write the file, and power down. Supercaps have been a lifesaver. There's even enough capacity to do the write cycle if the flash was powered down when a power fail is detected. That allows to not lose whatever what was already in the RAM buffer.
      
      Parent Share
      twitter facebook
      - Comment removed (Score:4, Interesting)
        
        by account_deleted ( 4530225 ) writes: on Friday March 01, 2013 @07:48PM (#43050715)
        
        Comment removed based on user account deletion
        
        Parent Share
        twitter facebook
        
        Re: (Score:3)
        
        by certsoft ( 442059 ) writes:
        
        Fortunately the client has facilities to test various drives over a wide temperature range (down to -40, not sure how hot they test) while running. And yes, a lot of them are crap.
        
        Re:We encountered something like this (Score:5, Interesting)
        
        by thejynxed ( 831517 ) writes: on Friday March 01, 2013 @09:14PM (#43051289)
        
        Not just a lot of them, most of them, to the point that my former contract rolled their own due to flaky controllers, etc put out by the SSD manufacturers. Yes, they found it cheaper and more efficient to make their own SSD drives, and to incinerate the ones that failed in a blast furnace than rely on the crap the manufacturers are currently foisting on the market.
        
        Parent Share
        twitter facebook
      - Re:We encountered something like this (Score:5, Interesting)
        
        by hot soldering iron ( 800102 ) writes: on Friday March 01, 2013 @10:01PM (#43051541)
        
        You might check into adding supercaps into the power supply, across the DC output lines.
        For a less DY method, you could try this: http://www.beam-tech.com/093001/prd_pgs/internal_ups.htm# [beam-tech.com]
        It's an internally mounted, UPS. There are also some PC power supplies that have the UPS built-in, but expect to pay a premium for those.
        If your application allows it, you might want to just mount your SSD into a laptop. It already has internal battery power, and there isn't any exotic hardware you have to pay through the nose for.
        
        Parent Share
        twitter facebook
        
        Re: (Score:3)
        
        by adolf ( 21054 ) writes:
        
        Do laptops ever monitor health of the battery if external power is never removed? I'm aware that laptops can tell when the battery is eventually trashed in nornal use (Dells, in particular, seem to be pretty bitchy about it with continuously-blinking lights, and report their findings to the OS if it bothers to ask).
        But being plugged in forever is not "normal use" for a laptop.
        I like your idea (and no, I'm not the AC you're replying to), but I have this vision of a small laptop that has been running with ex
        
        Re: (Score:3)
        
        by adolf ( 21054 ) writes:
        
        Right, sure: All of this battery information can certainly be gleaned under any operating system, given appropriate software.
        But the question is (restated): If the machine never runs on battery, does the machine know the health status of that battery? Does it really have any idea what those figures really are? Can it possibly know, without ever having run on (or otherwise discharged) the battery what the operational status of that battery really is?
        The implication is that if it cannot, then it's really
  - Re: (Score:3)
    
    by Dunbal ( 464142 ) * writes:
    
    The problem with voting has always been that the idiots get to vote too. So while you might "vote with your dollars" to select the most reliable drive, they will vote for the one with the cute name, or the shiny case, or the "free gift", or the special price, etc.
- Re:build in some power storage (Score:5, Insightful)
  
  by v1 ( 525388 ) writes: on Friday March 01, 2013 @06:23PM (#43049853) Homepage Journal
  
  space is at an extreme premium in those drives. There's a reason they feel so heavy/dense. Given the quilting layout of the chips, adding a single cap would prevent several memory chips from fitting. So you may as well then fill that remaining space with more caps. But you will reduce capacity, and that's what sells SSDs.
  There's already a substantial amount of circuitry in them, far from "basic". It's essentially a CPU. I'd be interested to see some numbers as to average power drain during idle, read, and write.
  The ones that did the best during the power blips probably did have caps and a bit more in their power system to handle it though. It certainly does surprise me that the mid-range, not the high-end, were the best performers in this test.
  
  Parent Share
  twitter facebook
  - Re: (Score:3)
    
    by Mad Merlin ( 837387 ) writes:
    
    space is at an extreme premium in those drives. There's a reason they feel so heavy/dense.
    I don't know what SSDs you've been using, but I've never picked up an SSD (OCZ Vertex 2/3, Intel X25-M/320/330/335/510/520) that didn't feel light and sound nearly hollow.
    - Re:build in some power storage (Score:4, Informative)
      
      by Mashiki ( 184564 ) writes: <mashiki@gm a i l . c om> on Friday March 01, 2013 @07:05PM (#43050341) Homepage
      
      I don't know what SSDs you've been using, but I've never picked up an SSD (OCZ Vertex 2/3, Intel X25-M/320/330/335/510/520) that didn't feel light and sound nearly hollow.
      Consumer drives are usually lightweight, they don't need the extra cooling. Enterprise drives depending on who they're made by and what they're for can have heatspreaders or heatsinks within, or attached to each chip adding to the weight.
      
      Parent Share
      twitter facebook
    - Re: (Score:3)
      
      by thegarbz ( 1787294 ) writes:
      
      but I've never picked up an SSD (OCZ Vertex 2/3, Intel X25-M/320/330/335/510/520) that didn't feel light and sound nearly hollow.
      Rip it open and have a look. There's not much weight at all to a piece of fibreglass and some plastic resin encasing some silicon. Circuit boards and components are really quite light when they don't require cooling or even large bits of metal for simple thermal mass.
      You'll find that even though it's light and looks hollow it'll be packed quite full. Now combine that with the problems associated with creating some form of energy storage. Storage can come in some electrical form, i.e. battery which would be
  - You ever look inside one? (Score:2)
    
    by Sycraft-fu ( 314770 ) writes:
    
    There is all kinds of extra space in a 2.5" SSD. They have a lil' CPU, some flash chips, and that's it more or less. They are quite small. In smaller form factors, then ya space can become an issue but there's plenty in a 2.5" unit.
  - Re: (Score:2)
    
    by AmiMoJo ( 196126 ) * writes:
    
    Most SSDs are 2.5" so there would be plenty of room for a large capacitor or small battery. You really don't need a lot of energy to finish flushing a small RAM buffer.
    - Re: (Score:3)
      
      by edmudama ( 155475 ) writes:
      
      Most of the enterprise grade SSDs on the market that are outfitted with power-loss protection circuitry fit these capacitors within the 2.5" form factor.
  - Re: (Score:2)
    
    by TechyImmigrant ( 175943 ) writes:
    
    >space is at an extreme premium in those drives.
    So put them in a desktop drive form. The first thing I do with SSDs is put them in one of those adaptors to make them fit in a normal drive tray.
- Re: (Score:3)
  
  by Guspaz ( 556486 ) writes:
  
  Most enterprise SSDs do have small supercapacitors or capacitor arrays onboard for exactly this reason. Some of the higher-end consumer drives do too. But most consumer drives don't.
  The answer? Get a UPS.
  - Re: (Score:2)
    
    by sjames ( 1099 ) writes:
    
    The answer? Get a UPS.
    Because those never fail.
    - - Re: (Score:2)
        
        by sjames ( 1099 ) writes:
        
        I didn't say don't use a UPS, I said they DO fail sometimes so don't pretend it can't happen.
  - Re: (Score:2)
    
    by dgatwood ( 11270 ) writes:
    The answer? Get a UPS.
    You're assuming a desktop-sized drive in a desktop computer, yet nearly all computers sold today are portables, and laptop users are more likely to buy bus-powered external drives than mains-powered drives.
    So the five most likely causes of power failure in a consumer hard drives (and presumably, in the future, SSDs), ordered from most likely to least likely, are probably:
    
    Somebody yanking a USB cable before the device is fully unmounted.
    The laptop's battery dying earlier than expecte
    - Re:build in some power storage (Score:4, Funny)
      
      by TechyImmigrant ( 175943 ) writes: on Friday March 01, 2013 @07:38PM (#43050637) Homepage Journal
      
      >yet nearly all computers sold today are portables
      What I really want is a potable computer, so I can drink it if I get thirsty.
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by Jeremi ( 14640 ) writes:
        
        >What I really want is a potable computer, so I can drink it if I get thirsty.
        What I really want is a pottable computer, so it can monitor my geraniums.
  - Re: (Score:2)
    
    by AmiMoJo ( 196126 ) * writes:
    
    Or maybe attach a capacitor or battery to the power connector (with diodes so you don't try to power the entire PC).
  - UPS does nothing for the common fault case. (Score:3, Informative)
    
    by stoploss ( 2842505 ) writes:
    
    Most enterprise SSDs do have small supercapacitors or capacitor arrays onboard for exactly this reason. Some of the higher-end consumer drives do too. But most consumer drives don't.
    The answer? Get a UPS.
    A UPS is no panacea: I experience grid failure very rarely.
    However, relatively speaking I experience many more kernel lockups that require an ACPI-initiated poweroff by holding down the power button until the machine abruptly powers off. What do you do when a reboot/poweroff command causes your Linux/BSD machine to hang? I/O handle leaks in the Samba SMB client (ie. *not* the smbd daemon) and the Samba Winbind code are notorious for this. The only times I have ever had to "yank power" from a production Linu
    - Re: (Score:2)
      
      by adolf ( 21054 ) writes:
      
      Instead of a supercap I'd rather there be a couple of replaceable Lithium coin cells inside of an SSD, to just finish finish writes after the power unexpectedly dips for some reason. They're cheap commodities, they seem to have predictable failure rates, and I don't remember the last time I changed one in any computer (though it used to be a fairly frequent repair).
      By using them only once in a blue moon and occasionally monitoring the voltage and setting a SMART error if they're getting worn out, I'd estim
    - Re: (Score:2)
      
      by Vairon ( 17314 ) writes:
      
      Assuming you have sysrq keys enabled, you can hit alt-sysrq-s, wait for the sync to complete, alt-sysrq-u, alt-sysrq-b. This performs a filesystem sync then remounts all filesystems read-only then boots the system. Also if you have a stuck mount point you can always use a lazy umount (umount -l) to remove it from filesystem hierarchy so you don't need to reboot in the first place.
    - - Re: (Score:2)
        
        by stoploss ( 2842505 ) writes:
        
        I don't understand how if they claim that it takes up to 20 sec for the final write to finalize that a computer that simply shutsdown in 10 sec won't have the same problem.
        Drives support a blocking "sync" command that is only supposed to return when the drive has flushed all pending writes and has reached quiescence. If there is nothing pending to flush then the command will return immediately. If not, it may take the cited 20 seconds to return. Normal reboot/poweroff procedure in the OS waits for this condition, and this has been around forever (the HDD equivalent is to flush write cache and park the heads). That's why a 10 second shutdown can be safe even with the putative
- Re: (Score:2)
  
  by WillgasM ( 1646719 ) writes:
  
  You would think. The only SSD I'm running is on my computer at home and my house is sufficiently UPS'd. It's always cool when the power goes out at my apartments but all my electronics keep going. I just wish there was a battery on that Time Warner box outside my door.
  - Re: (Score:2)
    
    by PRMan ( 959735 ) writes:
    
    I just wish there was a battery on that Time Warner box outside my door.
    Strange. My DirecTV DVRs just keep on working...
    - Re: (Score:2)
      
      by WillgasM ( 1646719 ) writes:
      
      I'm mostly talking about the Internet. Netflix only buffers a minute or two.
- Re: (Score:2)
  
  by Beardo the Bearded ( 321478 ) writes:
  
  That was my first thought as well, throw in one supercap and you'll solve this problem.
- Already done (Score:2)
  
  by rgbrenner ( 317308 ) writes:
  
  enterprise-class SSDs have capacitors designed to last long enough for the SSD to finish any writes if the power fails.
  Capacitors cost money though.. so this is one of the things that gets stripped out of consumer-level drives to reduce the price.
- Re: (Score:2)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
  - Re: (Score:2)
    
    by the eric conspiracy ( 20178 ) writes:
    
    I've had lots more failures due to UPSs going tits up than through data loss on SSDs.
  - - Re: (Score:2)
      
      by drinkypoo ( 153816 ) writes:
      
      After the third used APC UPS that still didn't work properly after battery replacement, I gave up. None of them could handle vaguely anywhere near the load they were supposed to. I don't know whose UPSes to buy, but I wouldn't buy anything from APC any more. It's unfortunate, because they used to follow a simple formula (fat traces, quality components, sturdy enclosures, priced accordingly) and they were a good value proposition.
      - Re: (Score:2)
        
        by David_Hart ( 1184661 ) writes:
        
        After the third used APC UPS that still didn't work properly after battery replacement, I gave up. None of them could handle vaguely anywhere near the load they were supposed to. I don't know whose UPSes to buy, but I wouldn't buy anything from APC any more. It's unfortunate, because they used to follow a simple formula (fat traces, quality components, sturdy enclosures, priced accordingly) and they were a good value proposition.
        I have tried many other UPS devices and APC are the only ones I trust. Its not clear if you are buying used UPS devices or are the original owner and are just replacing the battery. If you are buying used units (i.e. off of ebay), you never know if they have been hit with a surge, etc. Beyond that, they contain a battery. Batteries degrade over time, usually between 3 to 5 years, same as your car battery. This means that while the UPS will have the rated capacity with a new battery, it will become less over
      - Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
        
        Re: (Score:2)
        
        by ub3r n3u7r4l1st ( 1388939 ) writes:
        
        In other words that's why you should NEVER plug in a laser printer behind a UPS. The initial current surge when the laser printer wakes up can easily overload the UPS and shut itself off, cutting off the power to the other devices on the UPS completely.
      - Re: (Score:2)
        
        by drinkypoo ( 153816 ) writes:
        
        That multi-grid relay was supposed to be a warm standby approach, but it was too clever for its own good, which brings us back to your opening sentence. Manual switches would have been preferable. A brief power outage while a maintenance guy scrambles for the relay (hopefully there's someone stationed near the control...) is acceptable when the utility goes down. Having the system screw up and imagine itself an emergency is just lame.
- Re: (Score:2)
  
  by K. S. Kyosuke ( 729550 ) writes:
  
  Seriously... slap in some basic power circuitry and some caps
  A small, stupid, retro NiMH battery might work even better.
- Re: (Score:2)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  High-end SSDs have supercaps for that. Low-end SSD customers are to cheap to pay a few USD/EUR more for the added protection.
- - Re: (Score:2)
    
    by sjames ( 1099 ) writes:
    
    I bet no one ever thought of that!!
    Based on the paper, I guess they didn't
    - Re: (Score:3)
      
      by hawguy ( 1600213 ) writes:
      
      I bet no one ever thought of that!!
      Based on the paper, I guess they didn't
      Some SSDs already have capacitors that do just this, so yes, they did think of it. Did you really think that SSD manufacturers aren't aware of this issue?
      But when a few dollars can sway a purchase decision, and it's hard to convince consumers through a few sentences on the side of an SSD box that power protection circuitry is important to have, it's hard to justify putting it in. And since most SSD's are probably sold as OEM equipment where a few pennies can make the difference between getting the sale or n
      - Re: (Score:2)
        
        by TheRealMindChild ( 743925 ) writes:
        
        But when a few dollars can sway a purchase decision, and it's hard to convince consumers through a few sentences on the side of an SSD box that power protection circuitry is important to have, it's hard to justify putting it in
        
        This isn't buying a car. $3 or even $20 isn't going to be detrimental to the purchase oppritunity when the consumer can TELL it is of quality above the competitors. Blaming the consumer in this case sounds like you are on the other side
        
        Re: (Score:3)
        
        by TechyImmigrant ( 175943 ) writes:
        
        It wold be great if they told you about the feature so you could make an informed purchasing decision.
        
        Re: (Score:3)
        
        by yurtinus ( 1590157 ) writes:
        
        Exactly, this is buying consumer computer equipment. Put a label on the side with a bullet point touting your unexpected power fault protection and I can pretty much guarantee it will have no impact on your product sales. You know what will? The extra $2 price that puts you below the other guy on the "lowest price first" product sorting.
        
        Re: (Score:2)
        
        by hawguy ( 1600213 ) writes:
        
        My employee discount beats any $2 price difference.
        What kind of employee discount do you have that can take a $120 drive and a $122 drive and make the prices equivalent?
        
        Re: (Score:3)
        
        by froggymana ( 1896008 ) writes:
        
        Probably the five finger discount.
        
        Re: (Score:3)
        
        by hawguy ( 1600213 ) writes:
        
        But when a few dollars can sway a purchase decision, and it's hard to convince consumers through a few sentences on the side of an SSD box that power protection circuitry is important to have, it's hard to justify putting it in
        This isn't buying a car. $3 or even $20 isn't going to be detrimental to the purchase oppritunity when the consumer can TELL it is of quality above the competitors. Blaming the consumer in this case sounds like you are on the other side
        How can the consumer TELL if its quality is above the competitors? The presence of capacitors doesn't mean that it's a better drive than a drive without capacitors. It just means that you have more protection from one rare set of circumstances -- potentially with less reliability overall, since big electrolytic capacitors are known to fail, especially cheap ones.
        I suspect that most SSD's are bought as OEM drives buried inside laptops and desktops where the end user may not ever know what brand and/or mode
        
        Re: (Score:2)
        
        by Bill_the_Engineer ( 772575 ) writes:
        
        Component manufacturers target OEMs for the bulk of their sales. They will build to the price point that may win them a sale. The $3 to $20 amount may not make a difference to a consumer purchasing one from NewEgg, but to someone who purchases in blocks of 1000 it may.
      - Re: (Score:2)
        
        by sjames ( 1099 ) writes:
        
        It's not like they need a great deal of hold up time. Done well, they need only hold power long enough to successfully write a commit bit or decide not to.
      - Re: (Score:2)
        
        by adolf ( 21054 ) writes:
        
        it's hard to convince consumers through a few sentences on the side of an SSD box that power protection circuitry is important to have
        I disagree.
        To pick an example: Gigabyte advertises on the box that they use high-quality Japanese capacitors for their motherboards. And since every. single. motherboard failure I've seen in a decade has been due to bad caps, these words mean a lot to me.
        "Built-in power backup to help keep your data safe!" sounds like a good enough slogan to lure me in.
        But what do I know?
  - Re: (Score:2)
    
    by NatasRevol ( 731260 ) writes:
    
    They thought of it. They just didn't want to pay for it.
- - Re: (Score:3)
    
    by TechyImmigrant ( 175943 ) writes:
    
    >Flash memory is accessed in blocks and only blocks. Even if you need to write to a single bit, the entire block that that bit resides in needs to be re-written. This means before you can write, the entire block has to be read and stored temporary ram. If power is interrupted during a write operation then there is a very good chance the entire block will be lost because the contents of the flash controller's ram will be lost.
    You are wrong.
    Flash it written word by word. The size of the word depends on the
  - Re: (Score:2)
    
    by TechyImmigrant ( 175943 ) writes:
    
    You can write individual words in a flash chip.
    It takes longer to write than read because you have to force a bunch of electrons through an insulator.
    If you want to write over existing data, you have to erase the block it is in, because you can only erase whole blocks, but there is nothing to stop you incrementally writing to unused parts of a block.
Before you ask. (Score:5, Informative)

by eddy ( 18759 ) writes: on Friday March 01, 2013 @06:12PM (#43049767) Homepage Journal

The paper doesn't disclose the brands.

Share
twitter facebook
- Re: (Score:2)
  
  by war4peace ( 1628283 ) writes:
  
  Of course it doesn't. Naming/Shaming is not allowed.
  I was sarcastic, of course. They don't do it, though, because it'd probably put them in a crossfire of lawsuits coming from powerful companies. Nobody wants that. They will lose simply by being bullied financially. It's all about who brings more lawyers to the table, not who's right or wrong.
  - Re: (Score:2)
    
    by Mad Merlin ( 837387 ) writes:
    
    Which is unfortunate. That was the main reason I opened the PDF.
  - Re: (Score:2)
    
    by PRMan ( 959735 ) writes:
    
    Somebody should tell that to Consumer Reports...
- Re: (Score:2)
  
  by TechyImmigrant ( 175943 ) writes:
  
  MLC == Intel. But they were the good ones.
- Re: (Score:2)
  
  by greg1104 ( 461138 ) writes:
  
  I created a Reliable Writes [postgresql.org] page for PostgreSQL that talks about this and gives some known good and bad examples. Intel's 320 and 710 drives are the only two SATA SSDs still on the market that have survived the tests for clean shutdown I've advocated everyone run. They are units with a supercapacitor to enable power failure cleanup. If a drive doesn't have a battery for that sort of purpose, you will lose data at shutdown one day. And, no, a UPS is no cure, because all it takes to ruin a system on one is
- Re: (Score:2)
  
  by DragonTHC ( 208439 ) writes:
  
  Which makes it completely useless for 90% of us who just wasted our 3 minutes.
Anyone ever hear of a battery-backed cache? (Score:2)

by Midnight_Falcon ( 2432802 ) writes:

Last time I checked, standard platter-based disks had the same issue -- a problem that is solved in server/enterprise environments by placing a write-cache battery in the RAID controller.

In a desktop environment I suppose one could embed a write cache battery into the SSDs to abate the issue, but in a laptop environment it'd be unlikely you'd even encounter it since you'd have to be writing data while running out of battery, in which case, you might well deserve it :)
- Re: (Score:2)
  
  by LunaticTippy ( 872397 ) writes:
  
  A capacitor could hold enough power to finish a write cycle on SSD no problem. It wouldn't even have to be very large.
  - Re: (Score:2)
    
    by Midnight_Falcon ( 2432802 ) writes:
    
    True, but, then when the cap dries out and eventually bursts open it'd probably be a major cause of drive failure and lack of longevity.
    - Re: (Score:2)
      
      by drinkypoo ( 153816 ) writes:
      
      Use a solid cap, and/or socket the cap at the edge of the drive someplace.
  - - Re: (Score:2)
      
      by nabsltd ( 1313397 ) writes:
      
      Actually, you might be surprised at how large it would be. F=It/V. 10mS write time for a sector, 2S for a file.
      You don't need enough power to finish the OS-level task...you only need enough to write out the data in the drive's RAM cache. Since that is 256MB or less on most current SSD drives (512MB is found on some drives greater than 500GB), it's not as much as you estimate.
      Then, too, when you use the correct timings (10ms for a sector is about 200 times too long), you see that even the slowest SSD takes only about 5ms to write out 1MB (with an average around 3ms), that's around 0.75 seconds to flush the whole cac
- Re: (Score:2)
  
  by wisnoskij ( 1206448 ) writes:
  
  OR the battery fails, is taken out, or falls out.
Power corrupts... (Score:5, Funny)

by preflex ( 1840068 ) writes: on Friday March 01, 2013 @06:23PM (#43049855)

... Power failure corrupts absolutely.

Share
twitter facebook
UPS (Score:2)

by rossdee ( 243626 ) writes:

Why should a power failure corrupt anything? The UPS will shut the computer off if there is a prolonged outage.
Unsurprising (Score:3, Insightful)

by Anonymous Coward writes: on Friday March 01, 2013 @06:27PM (#43049893)

These devices have an elaborate internal database for the management of block remapping. For this to survive power failures it needs to use transactional updates. Getting this right is hard - it takes years for file systems and databases to become robust. I'd guess that many devices don't even attempt to do it and the ones that do probably have obscure failure modes. A UPS is essential.

Share
twitter facebook
Finally somebody said it! (Score:5, Informative)

by Dishwasha ( 125561 ) writes: on Friday March 01, 2013 @06:28PM (#43049907)

I had some original Vertex drives from OCZ that kept absolutely corrupting when my laptop got accidentally unplugged and I powered on the machine. I had to RMA them over and over and over again. I finally figured out that my battery was getting old and, although everything was functional even on battery power and it would boot, the initial large draw of power on boot must have created a voltage drop (i.e. brownout) which the SSDs weren't designed to compensate for. Within an hour of boot (even back on plugged power) they would choke, freeze the OS, and be rendered unusable from then on out.
Several SSD manufacturers are probably not engineering well for fluctuating power. Rather than fixing the problem with better engineering, OCZ simply changed their warranty policy to void the warranty if the customer is not providing proper power which, correct me if I'm wrong, I don't think rotating disk hard drive manufacturers have had that in their warranty clauses.

Share
twitter facebook
- Re: (Score:2, Insightful)
  
  by citylivin ( 1250770 ) writes:
  
  Well thats probably becuase you were using OCZ crap. I have never had a quality product from that company.
  However that said, I have noticed the same thing with the crucial m4s I have. In one particular laptop, it keeps bricking drives becuase the battery doesnt hold much of a charge any more. Luckily, i can "unbrick" them by plugging in the power (but not data) for 20 minutes, then plugging in the data connection, then rebooting the machine. Has worked more than once.
  and crucial has put out a bunch of firmw
  - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
- SSDs made by HD manufacturers (Score:2)
  
  by Burz ( 138833 ) writes:
  
  I think you make a good point about warranty clauses, and it would be hard to imagine HD manufacturers singling out their SSDs with an inferior warranty in this respect.
  Considering the paper cited by TFA won't spill the beans on which models were tested, it may be a safer bet to purchase SSDs from traditional HD makers (at least I hope that is the case with my Samsung).
- Re: (Score:2)
  
  by ckthorp ( 1255134 ) writes:
  
  I had some fun with trying to mount some Crucial M4 drives in USB external enclosures. They kept getting unmounted and the SMART block remap count kept running up, and up, and up. One of the drives outright failed and the other was at 55% spare sectors remaining when I figured out the issue. When there was a write, the current consumption from programming the FLASH chip would cause a voltage sag and the write would fail but it wasn't usually enough of a drop to make the drive reset. Once I bought the "Y
not naming names = data "pulled out of my ass" (Score:3, Insightful)

by citizenr ( 871508 ) writes: on Friday March 01, 2013 @06:52PM (#43050201) Homepage

Useless paper/test.

Share
twitter facebook
- Re: (Score:2)
  
  by Theovon ( 109752 ) writes:
  
  If they do that, they won't get any more free SSDs to test, and that'll impact their ability to write papers criticizing SSDs. What would you prefer? A paper biased towards SSDs too small/cheap to be useful to you, or one that doesn't name names? Anonymity is VERY important in this kind of research.
  - Re: (Score:3)
    
    by citizenr ( 871508 ) writes:
    
    If they do that, they won't get any more free SSDs to test, and that'll impact their ability to write papers criticizing SSDs. What would you prefer?
    I would prefer research to be done by someone who is not manufacturers bitch.
    You dont need a ton of money to test commodity hardware, the trick is to SELL stuff after the test, not take home and pretend it wasnt a bribe.
  - - Re: (Score:2)
      
      by citizenr ( 871508 ) writes:
      
      yes, they used 15, only few of those were of the same brand and model.
- - Re: (Score:2)
    
    by edmudama ( 155475 ) writes:
    
    SSDs are already in the big show, and have been demonstrated reliable in those applications. The key is choose your vendors carefully, ask how they were qualified, etc.
up/down/up/brown/fried (Score:2, Insightful)

by h8sg8s ( 559966 ) writes:

What some of folks don't realize is its the seesaw nature of many power events that's primarily behind both data corruption and SSD failure. It's a rare rack system that has its own power conditioning and UPS these days (HP NonStop comes to mind) and without it you're subject to whatever the event provides in the way of under/over voltage, spikes, drops, etc. Many times these happen in timeframes too fast for power switching equipment to react and in some cases its that stuff that gets fried first.
Interesting failure mode for Crucial SSDs (Score:2)

by ckthorp ( 1255134 ) writes:

There is a protection mechanism that I know exists in Crucial SSDs which makes the drive appear dead after some unclean shutdowns of the drive while it performs a firmware-level integrity check of the drive. It may exist in other brands as well. Sometimes it takes 2 runs of 30-60 minutes to get the drive to re-enumerate via SATA. I'd be curious to know if the "dead" drive was affected by this bug.
- Re: (Score:2)
  
  by drinkypoo ( 153816 ) writes:
  
  There is a protection mechanism that I know exists in Crucial SSDs which makes the drive appear dead after some unclean shutdowns of the drive while it performs a firmware-level integrity check of the drive.
  I don't know if they're violating a spec or not and it's probably a life's work to find out, but that seems very rude to me. They really ought to identify as busy or something, so that they don't just scare the piss out of you. If you almost-brick an Xperia phone by scragging the bootloader so bad you can't even reflash it, whatever handles the comms is still working and lurking in the background and it will enumerate via USB with the service interface. That way you know whether you should even bother. Woul
  - Re: (Score:2)
    
    by ckthorp ( 1255134 ) writes:
    
    I agree. The first time one of our engineer's laptops HD's did this, it was rather uncomfortable to say the least. I think a good compromise solution would be to have it enumerate with a "useful" drive textual model identifier like "M4 ERROR CHECKING, LEAVE ON 30 MIN" or some such. I'm sure it violates some standard, too, but it would at least give the user some indication of what is happening.
- Re: (Score:3)
  
  by Voyager529 ( 1363959 ) writes:
  
  You got this too? I just ordered a Crucial M4 on sale a few weeks ago. the day after I installed and cloned it, I had the same situation where it wouldn't start. I called Crucial, expecting to need an RMA. Luckily I got an informed gentleman on the phone who told me to leave it at the failed POST screen for 20 minutes, reboot, and give it another 20 minutes, and reboot again. It worked. Supposedly it's not so much a 'bug' as an 'obscure feature'. ...I'm keeping my spinning rust drive around just in case.
  - Re: (Score:2)
    
    by ckthorp ( 1255134 ) writes:
    
    Overall, it is a good thing. The data isn't organized linearly for wear leveling purposes, so a power outage can leave the metadata in an inconsistent state. Also, make sure you have the latest firmware on the drive. They had a fun one earlier that caused a drive lockup hourly after the power on counter hit about 35k hrs (or some such). I've got about 2 dz M4 drives in service, so I've seen a lot of the bugs.
Buy a SSD with a battery or capacitor (Score:3)

by thue ( 121682 ) writes: on Friday March 01, 2013 @07:16PM (#43050417) Homepage

This is old news; see fx Wikipedia's coverage [wikipedia.org]. Only buy SSDs with a battery or capacitor, or whatever is the in DRAM cache of the SSD will be lost on power failure.

Share
twitter facebook
My Personal Policy (Score:3, Insightful)

by wisnoskij ( 1206448 ) writes: on Friday March 01, 2013 @07:54PM (#43050765) Homepage

This is why I don't use prototype tech that is really not ready to be used in the real world. And if you do, expect loads of bugs and bricking.
But either way, thanks for funding the development of something I am excited to try out in 2-4+ years when it will be a mature usable technology.

Share
twitter facebook
- Re: (Score:2)
  
  by edmudama ( 155475 ) writes:
  
  SSDs are way past prototype technology at this point. The products from high quality vendors are both fast and robust.
Why does the word "ECC" (Score:2)

by rs79 ( 71822 ) writes:

Not occur anywhere on this page?
Since I did RTFA (Score:3)

by rabtech ( 223758 ) writes: on Saturday March 02, 2013 @02:00AM (#43052523) Homepage

Power loss protection (super capacitors) was stated on four of the drives (the four least expensive to boot). Only three performed flawlessly in the unserialized writes test. Those aren't great odds. In fact only two drives passed all tests with no errors, and it wasn't necessarily the SLC "enterprise" drives, though those two also passed the serialized writes test.
In case you aren't aware, unserialized writes invalidate *every* assumption, including write ahead, journaling, even your fancy BTRFS/ZFS. His example is a database where the transaction log write was sync'd before the data page write, then after a power failure the data page is persisted but the log write is gone.
You can recover from many of the other errors or at least detect them but unserialized writes can silently corrupt data or even ruin the entire filesystem.
Obviously the metadata/dead failures are the exception... Those render the whole SSD useless.

Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

build in some power storage (Score:5, Insightful)

Re: (Score:2, Insightful)

We encountered something like this (Score:5, Interesting)

Re:We encountered something like this (Score:5, Insightful)

Re:We encountered something like this (Score:5, Insightful)

Re: (Score:3, Informative)

Re: (Score:3)

Re:We encountered something like this (Score:5, Informative)

Comment removed (Score:4, Interesting)

Re: (Score:3)

Re:We encountered something like this (Score:5, Interesting)

Re:We encountered something like this (Score:5, Interesting)

Re: (Score:3)

Re: (Score:3)

Re: (Score:3)

Re:build in some power storage (Score:5, Insightful)

Re: (Score:3)

Re:build in some power storage (Score:4, Informative)

Re: (Score:3)

You ever look inside one? (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:build in some power storage (Score:4, Funny)

Re: (Score:2)

Re: (Score:2)

UPS does nothing for the common fault case. (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Already done (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Before you ask. (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Anyone ever hear of a battery-backed cache? (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Power corrupts... (Score:5, Funny)

UPS (Score:2)

Unsurprising (Score:3, Insightful)

Finally somebody said it! (Score:5, Informative)