Forgot your password?
typodupeerror
Data Storage Hardware

Ask Slashdot: How Do SSDs Die? 510

Posted by timothy
from the whimpery-bang dept.
First time accepted submitter kfsone writes "I've experienced, first-hand, some of the ways in which spindle disks die, but either I've yet to see an SSD die or I'm not looking in the right places. Most of my admin-type friends have theories on how an SSD dies but admit none of them has actually seen commercial grade drives die or deteriorate. In particular, the failure process seems like it should be more clinical than spindle drives. If you have X many of the same SSD drive and none of them suffer manufacturing defects, if you repeat the same series of operations on them they should all die around the same time. If that's correct, then what happens to SSDs in RAID? Either all your drives will start to fail together or at some point, your drives will become out of sync in-terms of volume sizing. So, have you had to deliberately EOL corporate grade SSDs? Do they die with dignity or go out with a bang?"
This discussion has been archived. No new comments can be posted.

Ask Slashdot: How Do SSDs Die?

Comments Filter:
  • Umm (Score:5, Insightful)

    by The MAZZTer (911996) <megazzt.gmail@com> on Tuesday October 16, 2012 @12:23PM (#41670151) Homepage
    It was my understanding that for traditional drives in a RAID you don't want to get all the same type of drive all made around the same time since they will fail around the same time too. Same would apply to SSDs.
  • Re:Die! (Score:1, Insightful)

    by Quakeulf (2650167) on Tuesday October 16, 2012 @12:31PM (#41670281)
    I am new to commenting on /. and I think lame attempts at humor belong to 9GAG and Reddit.
  • Re:Umm (Score:5, Insightful)

    by Anonymous Coward on Tuesday October 16, 2012 @12:31PM (#41670293)

    yeah, sounds like submitter may be mildly deficient

    Which is why he's asking.

    Fuck people who ask questions when they don't know something, right?

  • Re:Umm (Score:5, Insightful)

    by statusbar (314703) <jeffk@statusbar.com> on Tuesday October 16, 2012 @12:40PM (#41670431) Homepage Journal

    I've seen two instances where a drive failed. Each time there were no handy replacement drives. Within a week a second drive died the same way as the first! back to backup tapes! Better to have replacement drives in boxes waiting.

  • Re:Umm (Score:5, Insightful)

    by ByOhTek (1181381) on Tuesday October 16, 2012 @12:42PM (#41670459) Journal

    In general, if you get such an issue, it will happen early on in the life of the drives (one coworker had what he called the 30-day thrash rule - he would plan ahead and get a huge number of drives - the cheapest available meeting requirements, including avoiding manufacturers we had issues with previously, take a handleful, and thrash 'em for 30 days. If nothing bad happend, he'd either keep up 30 day thrashes on sets of hard drives, pulling out the duds, or just return the whole lot.

  • Re:Die! (Score:2, Insightful)

    by lister king of smeg (2481612) on Tuesday October 16, 2012 @12:59PM (#41670733)

    No offense intended but if your new why are you complaining about our long standing culture of cracking lame jokes, if you don't like it why did you join?

  • Re:Umm (Score:4, Insightful)

    by NeverVotedBush (1041088) on Tuesday October 16, 2012 @02:15PM (#41671885)
    When a drive fails and a RAID goes into reconstruction (if you are set up that way), that's when you are significantly more likely to have another drive fail due to all the extra activity across the RAID.

    We see it all the time on a big array. One must hustle to repair/rebuild the RAID... ;-)
  • Re:Umm (Score:5, Insightful)

    by Anonymous Coward on Tuesday October 16, 2012 @02:26PM (#41672029)

    I've seen two instances where a drive failed. Each time there were no handy replacement drives. Within a week a second drive died the same way as the first! back to backup tapes! Better to have replacement drives in boxes waiting.

    This. Your spares closet is your best friend in the enterprise. Ensure you keep it stocked.

    And locked. And don't label them "spares". Label them "cold swap fallback device" or something that management won't see as something "extra" that can be "repurposed" (i.e. stolen)

  • Re:CRC Errors (Score:4, Insightful)

    by markhahn (122033) on Tuesday October 16, 2012 @02:27PM (#41672047)

    this is not very useful, as it mainly points out that the initial generations of commodity SSDs were immature. not to mention that return rates contain other phenomena than wear or even failure.

  • Re:CRC Errors (Score:5, Insightful)

    by arth1 (260657) on Tuesday October 16, 2012 @03:16PM (#41672681) Homepage Journal

    I am running (6) OCZ Vertex2 256GB drives under heavy use 24/7. Almost 2 years on have only had one fail and it still works, just started kicking random errors.

    Your failure rate of > 8% per year isn't very reassurring.

  • Re:CRC Errors (Score:5, Insightful)

    by Dishwasha (125561) on Tuesday October 16, 2012 @03:39PM (#41672967)

    I would counter-argue that any flash drive manufacturer is asking for massive RMAs when the device is clearly targeted for the laptop market (otherwise they would manufacture it in a 3.5" format) where the operating environment is guaranteed to be running on a battery for long periods of time. Any research in to battery operation would expose you to the vast differences in operating voltage as batteries discharge as well as the age of the battery. It is just bad engineering to not take this in to account.

    Reformatting the drive was not an option because the drive wouldn't even detect in the BIOS unless the special factory jumper was set which is a non-operational mode for the drive. This problem was reproduced over 10 times with over 10 different drives of the same model Vertex. Slightly bad power caused the entire drive to be rendered unusable. Amazingly, none of the other hardware in the laptop had any problem with the power (i.e. screen, cpu, memory, other spindle-based hard drive, gpu, etc.). As I said, bad engineering.

Lisp Users: Due to the holiday next Monday, there will be no garbage collection.

Working...