Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Data Storage Stats

Ask Slashdot: Do You Test Your New Hard Drives? 348

An anonymous reader writes "Any Slashdot thread about drive failure is loaded with good advice about EOL — but what about the beginning? Do you normally test your new purchases as thoroughly as you test old, suspect drives? Has your testing followed the proverbial 'bathtub' curve of a lot of early failures, but with those that survive the first month surviving for years? And have you had any return problems with new failed drives, because you re-partitioned it, or 'ran Linux,' or used stress-test apps?"
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Do You Test Your New Hard Drives?

Comments Filter:
  • Heh (Score:5, Insightful)

    by Deekin_Scalesinger ( 755062 ) on Sunday December 23, 2012 @01:23PM (#42375759)
    Like, never. Out of the box and away she goes...good luck to thee!
    • Re:Heh (Score:5, Insightful)

      by JMJimmy ( 2036122 ) on Sunday December 23, 2012 @02:00PM (#42375987)

      Add to the above:

      HDD tools are useless. I recently tried a bunch of them - they all reported my HDD in perfect condition... while it was doing the click of death. HDD failed within a week.

      • Re:Heh (Score:4, Informative)

        by PlusFiveTroll ( 754249 ) on Sunday December 23, 2012 @02:40PM (#42376199) Homepage

        Sounds more like your hard drive s.m.a.r.t. was useless. The tools can only report what the drive tells it, if smart isn't telling about relocated sectors, resets, or whatever other terrible malfunction then they are left in the dark.

        • SMART itself is mostly useless and we should ignore it completely.
          • Re:Heh (Score:5, Interesting)

            by SuperTechnoNerd ( 964528 ) on Sunday December 23, 2012 @03:56PM (#42376647)
            You have to interpret the data correctly. Looking at seek error rate and raw read errors tells if the heads are positioning accurately. Run the drive hard (read/write patterns )and watch the temperature. And of course if you start seeing a non 0 pending, and realloc sector count you know the end is near. And watch as a drive gets older the spin up time will increase. (I rarely shut the raid server down so this is less important). I have smartd email and text me any time things start to get out of a happy place.. I do nightly quick test and weekly extended tests. Smart is useful - if your smart about it...
          • Not useless, just not a good indicator of a drive NOT being near death. Its a great indicator to confirm that the drive IS dying-- if you see for instance 500 bad sectors, you may want to prepare to replace that drive.

          • I wouldn't ignore it. While SMART saying everything is okay doesn't mean much, SMART telling you that there is a problem is a definite reason for concern.

        • Re:Heh (Score:4, Informative)

          by JMJimmy ( 2036122 ) on Sunday December 23, 2012 @03:30PM (#42376507)

          No, not SMART. I did a full range of tests with all suits on top of SMART (surface tests, etc)

          The only HDD tool I trust is the ancient one from GRC.

          • Re:Heh (Score:5, Informative)

            by thegarbz ( 1787294 ) on Sunday December 23, 2012 @05:53PM (#42377309)

            No, not SMART. I did a full range of tests with all suits on top of SMART (surface tests, etc)

            The only HDD tool I trust is the ancient one from GRC.

            That is absolutely laughable. Spinrite is about as good at interfacing with a modern drive than an old 16bit dos program trying to sqeeze every ounce of performance out of a 64bit processor. It had it's purpose in its day. These days running it will more likely do more harm than good.

            Not to mention that if your drive is at the end of life running a program that is widely known to give it a most horrendous thrashing is probably not a good idea.

          • Re:Heh (Score:4, Interesting)

            by Pentium100 ( 1240090 ) on Sunday December 23, 2012 @11:35PM (#42379163)

            MHDD works best for me for testing the drive. Spinrite (and ddrescue) is good for data recovery, but not that good for testing. I had one drive that have a lot of sectors that were good, except that the drive took 10-30 seconds to read them making the PC extremely slow (Windows would drop to PIO mode and be slow even when reading the good sectors).Chkdsk didn't detect anything, Spinrite didn't detect anything, only mhdd showed lots of slow sectors (I later made a list and manually marked them as bad, getting a 2.5" IDE drive is not that easy or fast, so it will have to do until then).

      • Comment removed (Score:5, Interesting)

        by account_deleted ( 4530225 ) on Sunday December 23, 2012 @03:13PM (#42376407)
        Comment removed based on user account deletion
        • Re:Heh (Score:5, Interesting)

          by greg1104 ( 461138 ) <gsmith@gregsmith.com> on Sunday December 23, 2012 @03:54PM (#42376627) Homepage

          Spinrite hasn't been useful for years. There's a good analysis why at Does SpinRite do what it claims to do? [serverfault.com]. Everything the program does can be done more efficiently with a simpler program run from a Linux boot CD. And the fact that it takes so long is a problem--you want to get data off a dying drive as quickly as possible. Here's what I wrote on that question years ago, and the rise of SSDs make this even more true now:

          SpinRite was a great program in the era it was written, a long time ago. Back then, it would do black magic to recover drives that were seemingly toast, by being more persistent than the drive firmware itself was.

          But here in 2009, it's worthless. Modern drives do complicated sector mapping and testing on their own, and SpinRite is way too old to know how to trigger those correctly on all the drives out there. What you should do instead is learn how to use smartmontools, probably via a Linux boot CD (since the main time you need them is when the drive is already toast).

          My usual routine when a drive starts to go back is to back its data up using dd, run smartmontools to see what errors its reporting, trigger a self-test and check the errors again, and then launch into the manufacturer's recovery software to see if the problem can be corrected by it. The idea that SpinRite knows more about the drive than the interface provided by SMART and the manufacturer tools is at least ten years obsolete. Also, getting the information into the SMART logs helps if you need to RMA the drive as defective, something SpinRite doesn't help you with.

          Note that the occasional reports you see that SpinRite "fixes" problems are coincidence. If you access a sector on a modern drive that is bad, the drive will often remap it for you from the spares kept around for that purpose. All SpinRite did was access the bad sector, it didn't actually repair anything. This is why you still get these anecdotal "it worked for me" reports related to it--the same thing would have been much better accomplished with a SMART scan.

          • Re:Heh (Score:5, Informative)

            by Culture20 ( 968837 ) on Sunday December 23, 2012 @04:26PM (#42376819)

            My usual routine when a drive starts to go back is to back its data up using dd

            ddrescue [gnu.org] is the tool for backing up a failing drive unless you really want to manually check every failed sector read then restart a new dd (skipping to the next sector).

      • Re:Heh (Score:4, Informative)

        by BLKMGK ( 34057 ) <{morejunk4me} {at} {hotmail.com}> on Sunday December 23, 2012 @04:49PM (#42376939) Homepage Journal

        Not exactly useless... There's a preclear script that many unRAID users use to beat up their drives while monitoring SMART. It doesn't just look at SMART for a thumbs up or down but monitors the various parameters that SMART throws out. Users run this multiple times in a row and find bad drives fairly regularly. I will admit that I've not been running it but judging from the numbers of folks who have been finding it useful and from the fact that warranties seem to be getting ever shorter I may begin doing so. I use a decent number of the 3TB drives that are always going on sale and I'm starting to think I'm tempting fate by not testing them. I've gotten spoiled in that my unRAID box covers my ass in the even of a failure but I see too damn many reports of new drives going toes up to not be concerned. I have 3 drives sitting on the shelf waiting to be loaded and I may beat them up beforehand just to be sure they won't screw me when I least expect it...

    • The only new computer component I always test out-of-the-box is RAM - I've had many bad experiences over the last 10 years with unstability due to bad RAM.

      As far as hard drives go, I never test them. I run several RAID arrays in the house, and I actually have had a replacement drive fail in a week (one of Seagate's recertified drives.) I noticed odd behaviour and rebooted the server and the RAID array was degraded. Oops!

      I guess in a way I do test them - if the new drive fails shortly after rebuilding the ar

    • On a more general note: I never move important data. What I do is: I copy data from old HDD to new HDD and then use KLS Backup to set up incremental back-up. I still use old HDD until it fails. When that happens, the old HDD is taken out of the system, the "new" HDD becomes the "old" HDD and a brand new HDD becomes... yes, you guessed it: new HDD :)

      Unimportant data never gets backed up (e.g. installed games or large ISOs I keep for some reason, music, uncompressed video captures, etc). It goes straight to t

  • by X0563511 ( 793323 ) on Sunday December 23, 2012 @01:30PM (#42375807) Homepage Journal

    If dban can write out every sector and not have smartctl show any pending sectors after the fact (and the average speed of the dban wipe was normal) then you've got good chances the drive will be fine.

    • by bill_mcgonigle ( 4333 ) * on Sunday December 23, 2012 @01:45PM (#42375913) Homepage Journal

      Yes, this. I do it online:

      dd if=/dev/zero of=/dev/sdX bs=8M

      and then check smartctl. If I'm making a really big zpool, I fill them up and let ZFS fail out the turkeys:

      dd if=/dev/zero of=/tank/zeros.dd bs=8M
      zpool scrub tank

      If I'm building a 30-drive storage server for a client I'll often see 1-2 fail out. Better to catch them now then when they're deployed (especially with the crap warranties on spinning rust these days). I need to order in staggered lots anyway, so having 10% overhead helps keep things moving along.

      • I'm a computer technician, and I'm always looking for new and better ways to test our equipment. Linux is one of my favourite testing tools, but right now I don't have a good way to test hard drives in Linux - I use stress to see if I get any errors or crashes on a suspicious drive. We use a Seagate boot disc for most of our testing though.

        I'm interested in using your technique, but I don't understand it well enough to give it a go on someone else's data. Can you explain it a bit please?

  • by AK Marc ( 707885 ) on Sunday December 23, 2012 @01:32PM (#42375829)
    My first help desk job included every computer in the company. We had a server drive fail, so I had Compaq send a replacement. The new arrival didn't work. So then I spent more time looking at RAID configuration and such, but we got a second replacement. That one didn't work either. But I tested it on arrival. The third replacement worked fine, just when I was worried it was something stupid I was missing. Two DOA RMAs for the same part. And yes, that's happened to me again since that first time.

    I test every "used" part as if it's suspect. The question was about new, but they are still new to me.
  • I havn't even considered testing my personal harddrives. If they break I try to retrieve whatever is on them, but I just buy new drives instead of spending any amount of time fixing them, never returned a disk - I just buy a couple of new ones whenever I need more space.

    At work we're using properly configured SANs with 24x7 support, so I couldn't be arsed to test disks there either. We don't have multiple racks of disks, so I don't see any good reason to test everything.

    If you're testing new diskdrives you

    • If you're testing new diskdrives you must be really bored or very broke.

      If you believe this you must not have many computers...

  • smartmontools (Score:5, Informative)

    by WD ( 96061 ) on Sunday December 23, 2012 @01:35PM (#42375851)

    Set up the smartd.conf file to do the example short-test daily and long-test weekly, and email you when something is fishy. It's a trivial amount of effort, resulting in a significant amount of peace of mind. (In many cases, you'll have some amount of warning before your drive kicks the bucket and it's too late)

  • Yes, if it's a windows box, I run chkdsk /F /R a few times, and defragment the drive after deploy. (Not because it needs it, but for the exercise.) Similar with fsck on linux. If it fails, I want it to fail when the in-store return policy is still in effect, so I don't have to deal with the manufacturer.

    But having a returned drive rejected because I repartitioned it or "ran linux"? Never heard of that.

    • Yes, if it's a windows box, I run chkdsk /F /R a few times, and defragment the drive after deploy. (Not because it needs it, but for the exercise.) Similar with fsck on linux. If it fails, I want it to fail when the in-store return policy is still in effect, so I don't have to deal with the manufacturer.

      Rather ineffective tests.

      Use smartctl and schedule long tests. Also try something like:
      dd if=/dev/sda of=/dev/sda bs=64k

  • betteridge's law of headlines applies here. Hard drives go through extensive calibration before shipping, so the need for burn in doesn't really exist. As for problems with RMAs for hard drives used under Linux, repartitioned, etc. No.
    • betteridge's law of headlines applies here.

      No, it doesn't. This is an actual, legitimate question.

      As I correctly predicted earlier this year [slashdot.org], lots of Slashdotters have seized upon Betteridge as the latest fad kneejerk response, and are misapplying it without understanding what it means. In his own words, [wikipedia.org] Betteridge's Law applies to cases where journalists "know the story is probably bollocks, and don’t actually have the sources and facts to back it up, but still want to run it."

      For example, without the evidence to back it up, a headline s

      • by rvw ( 755107 )

        betteridge's law of headlines applies here.

        No, it doesn't. This is an actual, legitimate question.

        Thanks for the clarification. If you read the answers here, you'll notice that while most people don't test their new drives, some people do, so that proves you're right.

    • > Hard drives go through extensive calibration before shipping, so the need for burn in doesn't really exist.

      Not any more apparently. In our manufacturing line we see a lot of bad block replacements during the first write pass.
      When I worked in the HDD field a couple of years ago every drive went through a 24h burn-in before it shipped. That doesn't seem to happen any more.

  • manufacturers do a burn-in before shipping, that gets most the early failures. of course, some will still win the lottery and get a crappy early-failure drive but has never happened to me.

    • Comment removed based on user account deletion
  • by cvtan ( 752695 ) on Sunday December 23, 2012 @01:39PM (#42375875)
    Old bathtubs lasted longer than old hard drives. Now it's the other way around.
  • by Anonymous Coward on Sunday December 23, 2012 @01:42PM (#42375901)

    I run some ZFS systems at work. With the current version of the filesystem, you can expand the zpools but you can't shrink them, so adding a bad drive causes immediate problems.

    I've found that some drives are completely functional but write at extremely slow rates: maybe 10% of normal. With typical consumer drives, maybe 1/20 is like this. To ensure I don't put a slow drive into a production zpool array of disks, I always make a small test zpool consisting of just the new batch of drives and stress-test them.

    This catches not only obviously bad drives, but also the slow or otherwise odd ones.

    • by mrmeval ( 662166 )

      How have you been treated when returning them? I'd like to know what brands and what vendor. I'm always looking for success stories especially on commodity hardware. Thanks.

  • by White Flame ( 1074973 ) on Sunday December 23, 2012 @01:55PM (#42375957)

    Trying to coax an error will never reveal one. Only when you start using it "for real" will the problem manifest.

  • Do you perform extensive functional tests against third party software libraries before including them in your system? In most situations, no -- if it's established and proven. You trust that it does what it advertises, and only when it doesn't do you dig further.

    Same goes for hard drives.

    • Wat? [destroyallsoftware.com] Do you download your software over UDP without any error checking or means of correction? Do your dll's and exe's not verify their size and signature? I tend to verify my packets, files, and packages [youtube.com].

    • Software is logic; It's mathematics. The problem with your logic is thus:
      "Do you perform mathematical proofs of theorems known to be proven and tested by many already? No, of course not. The same rules that govern logic constructs can be applied to physical reality"
      That is to say, you're ignoring the vast difference in the reliability of their construction materials: Matter is an imperfect imprecise medium very different from mathematics.

      Protip: Even the very elements themselves vary in atomic mass

  • I always do a format and a secure erase (one pass of zeros). In addition to finding bad sectors I want to be sure to get rid of any trace of whatever crap they put on it at the factory (viruses, kiddie porn, crapware, etc).

  • badblocks -t random /dev/sdX && shred /dev/sdX

    Badblocks checks for bad sectors while writting random data to the drive and after all is good, I run shred once or twice to fill the drive with random data. You can probably get by with just badblocks tho.

  • I buy hard drives in pairs, using one for live data and one kept offline until it's time to back up the live drive (I use Unison sync to quickly determine what's changed between the two drives). My boot drive gets backed up every night with Macrium Reflect. The secret to a happy life: assume that every drive will fail tomorrow and keep everything backed up.
    • The secret to a happy life: assume that every drive will fail tomorrow and keep everything backed up.

      That's why I horde precious metals instead of money: The fear that every drive will fail tomorrow. Can't say it's made me any happier overall -- being on a terrorist watch list. It does have it's rare moments, e.g., I can't fly, but I get to avoid the TSA.

  • SMART + badblocks (Score:5, Interesting)

    by SuperBanana ( 662181 ) on Sunday December 23, 2012 @02:23PM (#42376087)

    I run smartctl and capture the registers, then run badblocks, and compare smartctl's output to the pre-bad-blocks check.

    If there are any remapped blocks, the drive goes back, as the factory should have remapped the initial defects already, and that means new failed blocks in the first few hours of operation.

    • by sribe ( 304414 )

      Great idea, thanks. I always test new drives, but this one had not occurred to me.

    • That's the right way to do it but manufacturers increasingly don't accept returns for a single or few bad blocks. They say that's acceptable.
      The reason is probably that it's too time consuming to test the entire surface with the high capacities but mostly unchanged transfer rates that we see.

  • This answers most of your questions and does so using data based on a large dataset.
    http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/disk_failures.pdf [googleusercontent.com]

    If you are concerned about reliability I suggest using an Intel SSD. Their failure rate is very low.

  • ...bought and installed in desktops & laptops over the last decade, and what I've learned is to buy Seagate drives. I have seen way fewer defects and first-year failures on Seagate than WD, and I was happy to see Maxtor go away.
    • Same thing here. We use hundreds of drives per week, mostly Seagate plus some Hitachi and recently qualified Toshiba. No WD unless you count HGST.

  • When installing a new disk in a Mac, I run Disk Utility with the Secure Erase option enabled. This will write 7 or 30 passes of 0000 to every block, that should find any early problems...

  • I thoroughly test any new hdd I get for my desktop PC:

    The first thing I do is format it and install windows. If that works, then we know the drive isn't DOA
    From there I torture test it by copying several hundred gigabytes of software and movies, as well as installing some more programs.
    After that, I let it run for a few months, using it normally. If it crashes during that time, then I know it was bad.

  • Well, the last drive I returned to a manufacturer was one that I was running FreeBSD on and they didn't seem to care. Granted, the experience with the manufacturer (Seagate) was less-than-pleasant but that had nothing to do with my choice of OS which I don't think they ever asked.

    I now buy only Western Digital.
  • This is part of a process for testing new server gear.

    Since I use Fedora, currently at 17, burn in testing is important.

    Quick tip: Most of the distro's currently do not detect SSD drives during the install and do not include the "discard" keyword in the fstab entries for the device.

    If you do use a Modern Distro, make sure that if you install or use a SSD with it, to mount the device with kernel flag for TRIM support set.

    For example:

    UUID=xxxxxxxxxxxxxxxxx /mnt/ssd2 ext4 discard,defaults

  • by mbone ( 558574 ) on Sunday December 23, 2012 @03:18PM (#42376427)

    Testing is simple - plug it in, and run it till it fails. Might as well use it in the mean-time.

  • Wrong Approach (Score:5, Insightful)

    by nuckfuts ( 690967 ) on Sunday December 23, 2012 @04:29PM (#42376841)

    I've been dealing with hardware failures for 20+ years. What I've learned is that disasters WILL happen, regardless of what preventive measures are in place. So I shifted my focus toward recoverablity. To me, the important question is "When something catastrophic happens, how quickly and easily can I put things back in working order"?

    Since I use RAID where appropriate, and more importantly, I am positively fanatic about frequent, full, and tested backups, the only concern I have when a hard drive dies is whether I'm still entitled to a warranty replacement.

  • On black Friday I bought a 1 TB drive at Office Depot, and of course they waved the box over their anti-theft degauser. I asked for a different drive and told them that they shouldn't do that with drives. The girl gave me the look we all have seen, but the boy behind her actually agreed with me and they gave me a drive out of the cage and let me leave the store with the alarm blaring. I've just about filled it up already and It's been working fine.

  • I test every single drive before deployment. I've found Gibson Research Corp (grc.com) Spinrite to be vital. It's pretty much the only drive test / repair / recover tool I use - other than RAID recovery tools. I'm astonished at the number of people who say they don't test at all.

    Go visit a UPS or Fed-Ex distribution center and watch the "slapper" kick packages off the 45MPH belt onto a slide at the load dock. Small boxes like hard drive packages are airborne. I doesn't matter how much the factory tests.

  • by ncw ( 59013 ) on Sunday December 23, 2012 @08:21PM (#42378221) Homepage

    Stress testing hard disks is a particular bugbear of mine, after having some really bad luck with early hard disks. Over the 15 years that I've been doing it I've had to send back loads of hard disks and flash cards because they failed my tests, either breaking completely or returning single bit errors in your data. Mostly the manufacturers will take disks back if you can get their stupid Windows program to return an error code. Sometimes it takes a bit of arguing but ultimately the manufacturers want to keep you happy. Flash disks with single bit errors are the hardest to send back in my experience.

    Here is the latest generation of my stress testing code (re-written in Go recently): https://github.com/ncw/stressdisk [github.com]

    (Interestingly the stressdisk program sometimes finds bad ram in your computer too!)

    I generally thrash every new hard disk or memory card for 24 hours to see if I can break it before trusting any data to it!

    I also run a long smart test too.

    Somewhat paranoid, yes, but I really, really hate losing data!

I have hardly ever known a mathematician who was capable of reasoning. -- Plato

Working...