Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?
Data Storage Hardware

Disk Drive Failures 15 Times What Vendors Say 284

jcatcw writes "A Carnegie Mellon University study indicates that customers are replacing disk drives more frequently than vendor estimates of mean time to failure (MTTF) would require.. The study examined large production systems, including high-performance computing sites and Internet services sites running SCSI, FC and SATA drives. The data sheets for the drives indicated MTTF between 1 and 1.5 million hours. That should mean annual failure rates of 0.88%, annual replacement rates were between 2% and 4%. The study also shows no evidence that Fibre Channel drives are any more reliable than SATA drives."
This discussion has been archived. No new comments can be posted.

Disk Drive Failures 15 Times What Vendors Say

Comments Filter:
  • Re:Repeat? (Score:2, Interesting)

    by LiquidCoooled ( 634315 ) on Friday March 02, 2007 @05:23PM (#18211818) Homepage Journal
    Yes, and its mentioned in the report.
    The best part about the entire thing is the very last quote:

    "If they told me it was 100,000 hours, I'd still protect it the same way. If they told me if was 5 million hours I'd still protect it the same way. I have to assume every drive could fail."

    Just common sense.
  • by Lendrick ( 314723 ) on Friday March 02, 2007 @05:26PM (#18211880) Homepage Journal
    In the article, they mention that the study didn't track actual failures, just the how often customers *thought* there was a failure and replaced their drive. There are all sorts of reasons someone might think a drive has failed. They're not all correct. I can't begin to guess what percentage of those perceived failures were for real.

    This study is not news. All it says is that people *think* their hard drives fail more often than the mean time to failure.
  • by neiko ( 846668 ) on Friday March 02, 2007 @05:30PM (#18211936)
    TFA seems surprised by SATA drives lasting as long as Fibre...why one earth would your data interface have any consequences on the drive internals? Or are we talking assuming Interface = Data Throughput?
  • by Danga ( 307709 ) on Friday March 02, 2007 @05:30PM (#18211940)
    I have had 3 personal use hard drives go bad in the last 5 years, they were either Maxtor or Wester Digital. I am not hard on the drives other than leaving them on 24/7. The drives that failed were all just for data backup and I put them in big, well ventilated boxes. With this use I would think the drives would last for years (at least 5 years), but nope! The drives did not arrive broken either, they all functioned great for 1-2 years before dying. The quality of consumer hard drives nowadays is way, WAY low, and the manufacturers should do something about it.

    I don't consider myself a fluke because I know quite a few other people who have had similar problems. What's the deal?

    Also, does anyone else find this quote interesting?:

    "and may have failed for any reason, such as a harsh environment at the customer site and intensive, random read/write operations that cause premature wear to the mechanical components in the drive."

    It's a f$#*ing hard drive! Jesus H Tapdancing Christ how can they call that premature wear, do they calculate the MTTF by just letting the drive sit idle and never reading and writing to it? That actually wouldn't suprise me.
  • Even better ... (Score:4, Interesting)

    by khasim ( 1285 ) <brandioch.conner@gmail.com> on Friday March 02, 2007 @05:34PM (#18211994)
    Give me 6 month failure rates.

    Start with 100 drives. Continuous usage.

    How many fail in the first 6 months? 12 months? 18 months? ... 60 months? That would be the info that I'd need. Where's the big failure spike? I'm going to be replacing them right before that.
  • Check SMART Info (Score:4, Interesting)

    by Bill Dimm ( 463823 ) on Friday March 02, 2007 @05:41PM (#18212094) Homepage
    Slightly off-topic, but if you haven't checked the Self-Monitoring, Analysis and Reporting Technology (SMART) info provided by your drive to see if it is having errors, you probably should. You can download smartmontools [sourceforge.net], which works on Linux/Unix and Windows. Your Linux distro may have it included, but may not have the daemon running to automatically monitor the drive (smartd).

    To view the SMART info for drive /dev/sda do:
    smartctl -a /dev/sda
    To do a full disk read check (can take hours) do:
    smartctl -t long /dev/sda

    Sadly, I just found read errors on a 375-hour-old drive (manufacturer's software claimed that repair succeeded). Fortunately, they were on the Windows partition :-)
  • by crabpeople ( 720852 ) on Friday March 02, 2007 @06:12PM (#18212520) Journal
    Thats fair, but if you pull a bad drive, ghost it (assuming its not THAT bad), plop the new drive in, and the system works flawlessly, what are you to assume?

    I dont really care to know exactly what is wrong with the drive. If i replace it, and the problem goes away, I would consdier that a bad drive. Even if you could still read and write to it. I just did one this morning that showed no symptoms other than windows taking what I considered a long time, to boot. All the user complained about was sluggish performance, and there were no errors or drive noises to speak of. Problem fixed, user happy, drive bad.

    As I already posted, a good rule of thumb is 3 years from the date of manufacture, is when most drives go bad.

  • by Akaihiryuu ( 786040 ) on Friday March 02, 2007 @06:24PM (#18212654)
    I had a 4mb 72-pin parity SIMM go bad one time...this was about 12 years ago in a 486 I used to have. It just didn't work one day (it worked for the first two months). Turn the computer on, get past BIOS start, bam...parity error before bootloader could even start. Reboot, try again, parity error. Turn off parity checking, it actually started to boot and then crashed. The RAM was obviously very defective...when I took that 1 stick out the computer booted normally even with parity on, if I tried to boot with just that stick it would never even POST. That's the only time I have ever seen memory fail...but then it came from a really shady local dealer who regularly scammed people...this same guy had a rack of "shareware" DOS games with neatly printed labels (all labels he printed) for like $5/disk, all of the disks completely blank (not even formatted). I had happened to get one of those when I got the RAM, and my friend did too (from another part of the rack, we didn't give much thought to that at the time, was just an "oh, this looks like it might be neat" thing). Neither disk was even formatted. The CDROM drives he sold me and my friend died within a month also (about a month after the RAM). Amazingly the store was still in business when I went back with the stick of RAM...he looked at it with a magnifying glass, claimed it was "scratched" and therefore abused. I burned rubber out of his parking lot, tossing a lot of gravel against the windows, then I found a reputable place to get RAM (though this was back in the days when 4MB cost $200). 2 days later I drove by, the place was boarded up and closed. Both CDROM drives died within 2 days of each other a month later. Nothing that came out of that place worked.
  • by Tim Browse ( 9263 ) on Friday March 02, 2007 @06:38PM (#18212852)

    ...is that it detects SMART disk errors in normal use (i.e. you don't have to be watching the BIOS screens when your PC boots).

    When I was trying the Vista RC, it told me that my drive was close to failing. I, of course, didn't believe it at first, but I ran the Seagate test floppy and it agreed. So I sent it back to Seagate for a free replacement.

    About the only feature that impressed me in Vista, sadly. (And I'm not sure it should have impressed me, tbh. I'm assuming XP never did this as I've never seen/heard of such a feature.)

  • by egomaniac ( 105476 ) on Friday March 02, 2007 @10:56PM (#18214684) Homepage
    There is no context in which is appropriate to apply metric reasoning to computers.

    It's exactly this kind of bullshit that irritates me. Suppose you look at a file. It's 95,015,327 bytes long. You're claiming that referring to the file as being 95MB is "inappropriate"?

    I'm a software engineer, fully versed in binary math, and the fact that computers refer to that file as being 90MB still really pisses me off. It's pointless and annoying.

Machines that have broken down will work perfectly when the repairman arrives.