Ask Slashdot: Do You Test Your New Hard Drives? 348

Posted by timothy on Sunday December 23, 2012 @01:22PM from the just-bite-the-corner-a-little dept.

An anonymous reader writes "Any Slashdot thread about drive failure is loaded with good advice about EOL — but what about the beginning? Do you normally test your new purchases as thoroughly as you test old, suspect drives? Has your testing followed the proverbial 'bathtub' curve of a lot of early failures, but with those that survive the first month surviving for years? And have you had any return problems with new failed drives, because you re-partitioned it, or 'ran Linux,' or used stress-test apps?"

This discussion has been archived. No new comments can be posted.

Ask Slashdot: Do You Test Your New Hard Drives?

Load All Comments

Search 348 Comments Log In/Create an Account

Comments Filter:

Heh (Score:5, Insightful)

by Deekin_Scalesinger ( 755062 ) writes: on Sunday December 23, 2012 @01:23PM (#42375759)

Like, never. Out of the box and away she goes...good luck to thee!

Share
twitter facebook
- Re:Heh (Score:5, Insightful)
  
  by JMJimmy ( 2036122 ) writes: on Sunday December 23, 2012 @02:00PM (#42375987)
  
  Add to the above:
  HDD tools are useless. I recently tried a bunch of them - they all reported my HDD in perfect condition... while it was doing the click of death. HDD failed within a week.
  
  Parent Share
  twitter facebook
  - Re:Heh (Score:4, Informative)
    
    by PlusFiveTroll ( 754249 ) writes: on Sunday December 23, 2012 @02:40PM (#42376199) Homepage
    
    Sounds more like your hard drive s.m.a.r.t. was useless. The tools can only report what the drive tells it, if smart isn't telling about relocated sectors, resets, or whatever other terrible malfunction then they are left in the dark.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by spire3661 ( 1038968 ) writes:
      
      SMART itself is mostly useless and we should ignore it completely.
      - Re:Heh (Score:5, Interesting)
        
        by SuperTechnoNerd ( 964528 ) writes: on Sunday December 23, 2012 @03:56PM (#42376647)
        
        You have to interpret the data correctly. Looking at seek error rate and raw read errors tells if the heads are positioning accurately. Run the drive hard (read/write patterns )and watch the temperature. And of course if you start seeing a non 0 pending, and realloc sector count you know the end is near. And watch as a drive gets older the spin up time will increase. (I rarely shut the raid server down so this is less important). I have smartd email and text me any time things start to get out of a happy place.. I do nightly quick test and weekly extended tests. Smart is useful - if your smart about it...
        
        Parent Share
        twitter facebook
      - Re: (Score:3)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
      - Re: (Score:3)
        
        by toddestan ( 632714 ) writes:
        
        I wouldn't ignore it. While SMART saying everything is okay doesn't mean much, SMART telling you that there is a problem is a definite reason for concern.
    - Re:Heh (Score:4, Informative)
      
      by JMJimmy ( 2036122 ) writes: on Sunday December 23, 2012 @03:30PM (#42376507)
      
      No, not SMART. I did a full range of tests with all suits on top of SMART (surface tests, etc)
      The only HDD tool I trust is the ancient one from GRC.
      
      Parent Share
      twitter facebook
      - Re:Heh (Score:5, Informative)
        
        by thegarbz ( 1787294 ) writes: on Sunday December 23, 2012 @05:53PM (#42377309)
        
        No, not SMART. I did a full range of tests with all suits on top of SMART (surface tests, etc)
        The only HDD tool I trust is the ancient one from GRC.
        That is absolutely laughable. Spinrite is about as good at interfacing with a modern drive than an old 16bit dos program trying to sqeeze every ounce of performance out of a 64bit processor. It had it's purpose in its day. These days running it will more likely do more harm than good.
        Not to mention that if your drive is at the end of life running a program that is widely known to give it a most horrendous thrashing is probably not a good idea.
        
        Parent Share
        twitter facebook
      - Re:Heh (Score:4, Interesting)
        
        by Pentium100 ( 1240090 ) writes: on Sunday December 23, 2012 @11:35PM (#42379163)
        
        MHDD works best for me for testing the drive. Spinrite (and ddrescue) is good for data recovery, but not that good for testing. I had one drive that have a lot of sectors that were good, except that the drive took 10-30 seconds to read them making the PC extremely slow (Windows would drop to PIO mode and be slow even when reading the good sectors).Chkdsk didn't detect anything, Spinrite didn't detect anything, only mhdd showed lots of slow sectors (I later made a list and manually marked them as bad, getting a 2.5" IDE drive is not that easy or fast, so it will have to do until then).
        
        Parent Share
        twitter facebook
  - Comment removed (Score:5, Interesting)
    
    by account_deleted ( 4530225 ) writes: on Sunday December 23, 2012 @03:13PM (#42376407)
    
    Comment removed based on user account deletion
    
    Parent Share
    twitter facebook
    - Re:Heh (Score:5, Interesting)
      
      by greg1104 ( 461138 ) writes: <gsmith@gregsmith.com> on Sunday December 23, 2012 @03:54PM (#42376627) Homepage
      
      Spinrite hasn't been useful for years. There's a good analysis why at Does SpinRite do what it claims to do? [serverfault.com]. Everything the program does can be done more efficiently with a simpler program run from a Linux boot CD. And the fact that it takes so long is a problem--you want to get data off a dying drive as quickly as possible. Here's what I wrote on that question years ago, and the rise of SSDs make this even more true now:
      SpinRite was a great program in the era it was written, a long time ago. Back then, it would do black magic to recover drives that were seemingly toast, by being more persistent than the drive firmware itself was.
      But here in 2009, it's worthless. Modern drives do complicated sector mapping and testing on their own, and SpinRite is way too old to know how to trigger those correctly on all the drives out there. What you should do instead is learn how to use smartmontools, probably via a Linux boot CD (since the main time you need them is when the drive is already toast).
      My usual routine when a drive starts to go back is to back its data up using dd, run smartmontools to see what errors its reporting, trigger a self-test and check the errors again, and then launch into the manufacturer's recovery software to see if the problem can be corrected by it. The idea that SpinRite knows more about the drive than the interface provided by SMART and the manufacturer tools is at least ten years obsolete. Also, getting the information into the SMART logs helps if you need to RMA the drive as defective, something SpinRite doesn't help you with.
      Note that the occasional reports you see that SpinRite "fixes" problems are coincidence. If you access a sector on a modern drive that is bad, the drive will often remap it for you from the spares kept around for that purpose. All SpinRite did was access the bad sector, it didn't actually repair anything. This is why you still get these anecdotal "it worked for me" reports related to it--the same thing would have been much better accomplished with a SMART scan.
      
      Parent Share
      twitter facebook
      - Re:Heh (Score:5, Informative)
        
        by Culture20 ( 968837 ) writes: on Sunday December 23, 2012 @04:26PM (#42376819)
        
        My usual routine when a drive starts to go back is to back its data up using dd
        ddrescue [gnu.org] is the tool for backing up a failing drive unless you really want to manually check every failed sector read then restart a new dd (skipping to the next sector).
        
        Parent Share
        twitter facebook
      - Re: (Score:3)
        
        by washu_k ( 1628007 ) writes:
        
        Running spinrite against an SSD is one of the clearest ways of showing that it is complete BS. It will report all sorts of things about the drive that are clearly impossible. It won't error or give no data, it clearly makes things up about the drive.
        
        Another good BS test for spinrite is to run it against a non-ATA drive that is still BIOS accessible. A booted USB flash drive is the best, but something like a modern SCSI/SAS controller works as well. It's clearly impossible for spinrite to access such
  - Re:Heh (Score:4, Informative)
    
    by BLKMGK ( 34057 ) writes: <morejunk4me@NOspam.hotmail.com> on Sunday December 23, 2012 @04:49PM (#42376939) Homepage Journal
    
    Not exactly useless... There's a preclear script that many unRAID users use to beat up their drives while monitoring SMART. It doesn't just look at SMART for a thumbs up or down but monitors the various parameters that SMART throws out. Users run this multiple times in a row and find bad drives fairly regularly. I will admit that I've not been running it but judging from the numbers of folks who have been finding it useful and from the fact that warranties seem to be getting ever shorter I may begin doing so. I use a decent number of the 3TB drives that are always going on sale and I'm starting to think I'm tempting fate by not testing them. I've gotten spoiled in that my unRAID box covers my ass in the even of a failure but I see too damn many reports of new drives going toes up to not be concerned. I have 3 drives sitting on the shelf waiting to be loaded and I may beat them up beforehand just to be sure they won't screw me when I least expect it...
    
    Parent Share
    twitter facebook
- Re: (Score:2)
  
  by danomac ( 1032160 ) writes:
  
  The only new computer component I always test out-of-the-box is RAM - I've had many bad experiences over the last 10 years with unstability due to bad RAM.
  As far as hard drives go, I never test them. I run several RAID arrays in the house, and I actually have had a replacement drive fail in a week (one of Seagate's recertified drives.) I noticed odd behaviour and rebooted the server and the RAID array was degraded. Oops!
  I guess in a way I do test them - if the new drive fails shortly after rebuilding the ar
- Re: (Score:2)
  
  by war4peace ( 1628283 ) writes:
  
  On a more general note: I never move important data. What I do is: I copy data from old HDD to new HDD and then use KLS Backup to set up incremental back-up. I still use old HDD until it fails. When that happens, the old HDD is taken out of the system, the "new" HDD becomes the "old" HDD and a brand new HDD becomes... yes, you guessed it: new HDD :)
  Unimportant data never gets backed up (e.g. installed games or large ISOs I keep for some reason, music, uncompressed video captures, etc). It goes straight to t
- - Comment removed (Score:4, Informative)
    
    by account_deleted ( 4530225 ) writes: on Sunday December 23, 2012 @03:25PM (#42376465)
    
    Comment removed based on user account deletion
    
    Parent Share
    twitter facebook
    - Re:Heh (Score:4, Informative)
      
      by Burpmaster ( 598437 ) writes: on Sunday December 23, 2012 @03:57PM (#42376653)
      
      What you want is just 'badblocks -w <device>'.
      
      Parent Share
      twitter facebook
    - Re:Heh (Score:5, Interesting)
      
      by greg1104 ( 461138 ) writes: <gsmith@gregsmith.com> on Sunday December 23, 2012 @04:10PM (#42376727) Homepage
      
      SMART is a part of the modern drive's firmware. You can't bypass it. Anyone who tells you otherwise--such as the makers of Spinrite--is lying to you in order to sell a product.
      The quality of SMART implementation varies significantly based on the manufacturer. Anecdotally, I have 3 failed Western Digital drives here that flat out lie about the drive's errors. Running the tool needed to generate an RMA does a full SMART scan of the drive, remaps some bad sectors, and then says everything is good. But it's not--each drive is still broken, in a way the firmware seems downright evasive about. Try to use it again, it doesn't take long until another failure. It does seem like the sole purpose of SMART and its associated utilities on WD drives is to keep people from returning a bad drive, by providing a gatekeeper in that process that never says there's a problem.
      Most of my serious installations avoid WD drives like the plague for this reason. I think that Seagate's drives are probably less reliable overall than WD nowadays. Regardless I prefer them, simply because the firmware is more honest about the errors that do happen. Drives fail and I plan for that. What I can't deal with is drives that fail but don't admit it.
      The reason there are "RAID edition" firmware available is to provide a drive that isn't supposed to be as evasive about errors. It may be that some WD RAID edition models might not have the problem I'm describing. I soured on them as a brand before those became mainstream.
      
      Parent Share
      twitter facebook
      - Re: (Score:3)
        
        by PhunkySchtuff ( 208108 ) writes:
        
        Get enterprise series drives, not consumer drives. One difference is the firmware is a lot more up-front about errors, rather than trying to hide them and carry on as if everything is OK.
        In a RAID, you're going to want to fail a drive as soon as it starts to play up, whereas the average consumer wants a drive that doesn't turn around and die at the first small error, where it can remap sectors and pretend that nothing happened.
        Part of the reason enterprise drives cost more, when they're often the same, or v
    - Re: (Score:2)
      
      by fufufang ( 2603203 ) writes:
      
      That's nice, an OS used by less than 2% of the entire planet has some tool that reports what SMART is telling it, no different that a billion freeware programs for Windows. Just FYI but I can think of about a dozen freeware programs that will do the same damned thing in Windows, INCLUDING the email, so its not exactly like you got anything to brag about Ms AC.
      Now I'm gonna spell out what the REAL problem is, which any guy who has spent time in the trenches will tell you and that is SMART SUCKS ASS and for several years has more about covering bad batches for the HDD OEMs than it has been for actually telling you something is going bad. I have had drives in the shop that sounded like an angle grinder bouncing on pavement where SMART said "Nope, nothing wrong here la la la"" while the thing just ground and sputtered, its the most fucking pointless diagnostic tool there is.
      What we NEED is a replacement for Spinrite, something that bypasses the lying SMART and just runs a pass of zeroes and ones on the drive and reports a simple pass/fail on the read/writes. Spinrite was fucking brilliant for this, it would give you a layout of the entire drive with red for sectors that failed to report the correct data back and blue for clean so it took just a second to glance at the readout to spot a drive that was buggy out of the box, but nobody has updated the tool in years so its useless now since it can't do SATA 6 or drives above 500Gb.
      You meant badblocks? It does exactly what you suggested. However it can't detect those dynamic bad sector remapping done by the firmware.
      So how about it FOSS devs, here is the requirements: Bypass SMART, does a single R/W cycle, reports results. That's ALL it has to do anjd so far nobody has stepped up to the plate. damned near every shop I knew including mine had bought a copy of Spinrite so there is good money to be made there if you are willing to put in the work, its a niche but its a niche with money, builders, repair shops and gamers would all love to hand you money for this tool, so get on it and report back when its done, okay?
  - - Re: (Score:3)
      
      by Runaway1956 ( 1322357 ) writes:
      
      I saw nothing about any burn in tests in the GP post. The guy has a couple of scripts running to ensure that A) he is made aware of impending hard disk problems, and B) his data is backed up in the event of a hard disk problem.
      Reading comprehension 101, available at a community college near you.
      Unless, of course, you're just trolling a Linux user. In which case, feel free to continue making a fool of yourself.
    - Re: (Score:3)
      
      by Nutria ( 679911 ) writes:
      
      Computer hardware is cheap
      Relative to 10 years ago, but $150 here, $100 there and $75 somewhere else add up for an impoverished college student, or a middle class family with other expenses out the wazoo to pay.
dban followed by smartctl (Score:4, Interesting)

by X0563511 ( 793323 ) writes: on Sunday December 23, 2012 @01:30PM (#42375807) Homepage Journal

If dban can write out every sector and not have smartctl show any pending sectors after the fact (and the average speed of the dban wipe was normal) then you've got good chances the drive will be fine.

Share
twitter facebook
- Re:dban followed by smartctl (Score:5, Interesting)
  
  by bill_mcgonigle ( 4333 ) * writes: on Sunday December 23, 2012 @01:45PM (#42375913) Homepage Journal
  
  Yes, this. I do it online:
  dd if=/dev/zero of=/dev/sdX bs=8M
  
  and then check smartctl. If I'm making a really big zpool, I fill them up and let ZFS fail out the turkeys:
  dd if=/dev/zero of=/tank/zeros.dd bs=8M zpool scrub tank
  
  If I'm building a 30-drive storage server for a client I'll often see 1-2 fail out. Better to catch them now then when they're deployed (especially with the crap warranties on spinning rust these days). I need to order in staggered lots anyway, so having 10% overhead helps keep things moving along.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by Redlazer ( 786403 ) writes:
    
    I'm a computer technician, and I'm always looking for new and better ways to test our equipment. Linux is one of my favourite testing tools, but right now I don't have a good way to test hard drives in Linux - I use stress to see if I get any errors or crashes on a suspicious drive. We use a Seagate boot disc for most of our testing though.
    I'm interested in using your technique, but I don't understand it well enough to give it a go on someone else's data. Can you explain it a bit please?
Used to never test (Score:3)

by AK Marc ( 707885 ) writes: on Sunday December 23, 2012 @01:32PM (#42375829)

My first help desk job included every computer in the company. We had a server drive fail, so I had Compaq send a replacement. The new arrival didn't work. So then I spent more time looking at RAID configuration and such, but we got a second replacement. That one didn't work either. But I tested it on arrival. The third replacement worked fine, just when I was worried it was something stupid I was missing. Two DOA RMAs for the same part. And yes, that's happened to me again since that first time.

I test every "used" part as if it's suspect. The question was about new, but they are still new to me.

Share
twitter facebook
- Re: (Score:2)
  
  by Hentes ( 2461350 ) writes:
  
  Not to mention that some shadier shops tend to resell used or returned parts as new.
- Re:Used to never test (Score:4, Interesting)
  
  by PlusFiveTroll ( 754249 ) writes: on Sunday December 23, 2012 @03:05PM (#42376361) Homepage
  
  Two DOA of the same part isn't out of the question, a good amount of the time the same part number is from the same batch, which may suffer from the same manufacturing defects. I see things like that pretty often in batches of disks that fall out of RAIDs.
  
  Parent Share
  twitter facebook
Anyone actually does this? (Score:2)

by tantrum ( 261762 ) writes:

I havn't even considered testing my personal harddrives. If they break I try to retrieve whatever is on them, but I just buy new drives instead of spending any amount of time fixing them, never returned a disk - I just buy a couple of new ones whenever I need more space.
At work we're using properly configured SANs with 24x7 support, so I couldn't be arsed to test disks there either. We don't have multiple racks of disks, so I don't see any good reason to test everything.
If you're testing new diskdrives you
- Re: (Score:2)
  
  by VortexCortex ( 1117377 ) writes:
  
  If you're testing new diskdrives you must be really bored or very broke.
  If you believe this you must not have many computers...
smartmontools (Score:5, Informative)

by WD ( 96061 ) writes: on Sunday December 23, 2012 @01:35PM (#42375851)

Set up the smartd.conf file to do the example short-test daily and long-test weekly, and email you when something is fishy. It's a trivial amount of effort, resulting in a significant amount of peace of mind. (In many cases, you'll have some amount of warning before your drive kicks the bucket and it's too late)

Share
twitter facebook
- Re:smartmontools (Score:5, Funny)
  
  by Deekin_Scalesinger ( 755062 ) writes: on Sunday December 23, 2012 @01:58PM (#42375969)
  
  This should be modded up for your username alone lol
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
- Re: (Score:2)
  
  by spire3661 ( 1038968 ) writes:
  
  SMART is not a useful indicator of anything, dont rely on it.
Exercise the drive (Score:2)

by roc97007 ( 608802 ) writes:

Yes, if it's a windows box, I run chkdsk /F /R a few times, and defragment the drive after deploy. (Not because it needs it, but for the exercise.) Similar with fsck on linux. If it fails, I want it to fail when the in-store return policy is still in effect, so I don't have to deal with the manufacturer.
But having a returned drive rejected because I repartitioned it or "ran linux"? Never heard of that.
- Re: (Score:2)
  
  by whoever57 ( 658626 ) writes:
  
  Yes, if it's a windows box, I run chkdsk /F /R a few times, and defragment the drive after deploy. (Not because it needs it, but for the exercise.) Similar with fsck on linux. If it fails, I want it to fail when the in-store return policy is still in effect, so I don't have to deal with the manufacturer.
  Rather ineffective tests.
  Use smartctl and schedule long tests. Also try something like:
  dd if=/dev/sda of=/dev/sda bs=64k
betteridge's law of headlines (Score:2)

by whoever57 ( 658626 ) writes:

betteridge's law of headlines applies here. Hard drives go through extensive calibration before shipping, so the need for burn in doesn't really exist. As for problems with RMAs for hard drives used under Linux, repartitioned, etc. No.
- Did ketchup lead to the extinction of dinosaurs? (Score:2)
  
  by Dogtanian ( 588974 ) writes:
  
  betteridge's law of headlines applies here.
  No, it doesn't. This is an actual, legitimate question.
  
  As I correctly predicted earlier this year [slashdot.org], lots of Slashdotters have seized upon Betteridge as the latest fad kneejerk response, and are misapplying it without understanding what it means. In his own words, [wikipedia.org] Betteridge's Law applies to cases where journalists "know the story is probably bollocks, and don’t actually have the sources and facts to back it up, but still want to run it."
  
  For example, without the evidence to back it up, a headline s
  - Re: (Score:2)
    
    by rvw ( 755107 ) writes:
    
    betteridge's law of headlines applies here.
    No, it doesn't. This is an actual, legitimate question.
    Thanks for the clarification. If you read the answers here, you'll notice that while most people don't test their new drives, some people do, so that proves you're right.
- Re: (Score:2)
  
  by rrohbeck ( 944847 ) writes:
  
  > Hard drives go through extensive calibration before shipping, so the need for burn in doesn't really exist.
  Not any more apparently. In our manufacturing line we see a lot of bad block replacements during the first write pass.
  When I worked in the HDD field a couple of years ago every drive went through a 24h burn-in before it shipped. That doesn't seem to happen any more.
Re: (Score:2)

by account_deleted ( 4530225 ) writes:

Comment removed based on user account deletion
- Re: (Score:3)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
  - - Re: (Score:3)
      
      by Wolfrider ( 856 ) writes:
      
      --If I were you, I would look into the following:
      o Test all drives before putting them into production - either with SMART long test, or linux 'badblocks'
      o Cooling - is it adequate enough?
      o Powerful enough Power supply ++ UPS (essential these days)
      o Mount all drives with "noatime" option in Linux, or in XP and later:
      ' fsutil behavior set disablelastaccess 1 ' and reboot
      o Spin down all HDs when not in use.
      --I do all of the above, and my drives last for years and years. Just sayin'
Lifetime of bathtubs (Score:3)

by cvtan ( 752695 ) writes: on Sunday December 23, 2012 @01:39PM (#42375875)

Old bathtubs lasted longer than old hard drives. Now it's the other way around.

Share
twitter facebook
Yes! Especially before adding them to an array. (Score:5, Interesting)

by Anonymous Coward writes: on Sunday December 23, 2012 @01:42PM (#42375901)

I run some ZFS systems at work. With the current version of the filesystem, you can expand the zpools but you can't shrink them, so adding a bad drive causes immediate problems.
I've found that some drives are completely functional but write at extremely slow rates: maybe 10% of normal. With typical consumer drives, maybe 1/20 is like this. To ensure I don't put a slow drive into a production zpool array of disks, I always make a small test zpool consisting of just the new batch of drives and stress-test them.
This catches not only obviously bad drives, but also the slow or otherwise odd ones.

Share
twitter facebook
- Re: (Score:2)
  
  by mrmeval ( 662166 ) writes:
  
  How have you been treated when returning them? I'd like to know what brands and what vendor. I'm always looking for success stories especially on commodity hardware. Thanks.
Murphy's Law of Testing (Score:3)

by White Flame ( 1074973 ) writes: on Sunday December 23, 2012 @01:55PM (#42375957)

Trying to coax an error will never reveal one. Only when you start using it "for real" will the problem manifest.

Share
twitter facebook
Do you test third party software components? (Score:2)

by thePowerOfGrayskull ( 905905 ) writes:

Do you perform extensive functional tests against third party software libraries before including them in your system? In most situations, no -- if it's established and proven. You trust that it does what it advertises, and only when it doesn't do you dig further.
Same goes for hard drives.
- Re: (Score:2)
  
  by PlusFiveTroll ( 754249 ) writes:
  
  Wat? [destroyallsoftware.com] Do you download your software over UDP without any error checking or means of correction? Do your dll's and exe's not verify their size and signature? I tend to verify my packets, files, and packages [youtube.com].
- Re: (Score:2)
  
  by VortexCortex ( 1117377 ) writes:
  
  Software is logic; It's mathematics. The problem with your logic is thus:
  "Do you perform mathematical proofs of theorems known to be proven and tested by many already? No, of course not. The same rules that govern logic constructs can be applied to physical reality"
  That is to say, you're ignoring the vast difference in the reliability of their construction materials: Matter is an imperfect imprecise medium very different from mathematics.
  Protip: Even the very elements themselves vary in atomic mass
format and secure erase (Score:2)

by danlip ( 737336 ) writes:

I always do a format and a secure erase (one pass of zeros). In addition to finding bad sectors I want to be sure to get rid of any trace of whatever crap they put on it at the factory (viruses, kiddie porn, crapware, etc).
- Re: (Score:2)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
Badblocks/Shred (Score:2)

by SealBeater ( 143912 ) writes:

badblocks -t random /dev/sdX && shred /dev/sdX
Badblocks checks for bad sectors while writting random data to the drive and after all is good, I run shred once or twice to fill the drive with random data. You can probably get by with just badblocks tho.
- - Re: (Score:2)
    
    by SealBeater ( 143912 ) writes:
    
    shred won't do much for you since many of the write errors are silent and will go unnoticed until the drive fails to read the block back. I'd rather spend the time on a second badblocks pass.
    The reason I use shred is because it fills the drive with random data faster than badblocks does. I do this because I do whole disk encryption.
No - I just assume they will fail (Score:2)

by turkeyfeathers ( 843622 ) writes:

I buy hard drives in pairs, using one for live data and one kept offline until it's time to back up the live drive (I use Unison sync to quickly determine what's changed between the two drives). My boot drive gets backed up every night with Macrium Reflect. The secret to a happy life: assume that every drive will fail tomorrow and keep everything backed up.
- Re: (Score:2)
  
  by VortexCortex ( 1117377 ) writes:
  
  The secret to a happy life: assume that every drive will fail tomorrow and keep everything backed up.
  That's why I horde precious metals instead of money: The fear that every drive will fail tomorrow. Can't say it's made me any happier overall -- being on a terrorist watch list. It does have it's rare moments, e.g., I can't fly, but I get to avoid the TSA.
SMART + badblocks (Score:5, Interesting)

by SuperBanana ( 662181 ) writes: on Sunday December 23, 2012 @02:23PM (#42376087)

I run smartctl and capture the registers, then run badblocks, and compare smartctl's output to the pre-bad-blocks check.
If there are any remapped blocks, the drive goes back, as the factory should have remapped the initial defects already, and that means new failed blocks in the first few hours of operation.

Share
twitter facebook
- Re: (Score:2)
  
  by sribe ( 304414 ) writes:
  
  Great idea, thanks. I always test new drives, but this one had not occurred to me.
- Re: (Score:3)
  
  by rrohbeck ( 944847 ) writes:
  
  That's the right way to do it but manufacturers increasingly don't accept returns for a single or few bad blocks. They say that's acceptable.
  The reason is probably that it's too time consuming to test the entire surface with the high capacities but mostly unchanged transfer rates that we see.
Google Whitepaper Answers Your Questions (Score:2)

by idealego ( 32141 ) writes:

This answers most of your questions and does so using data based on a large dataset.
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/disk_failures.pdf [googleusercontent.com]
If you are concerned about reliability I suggest using an Intel SSD. Their failure rate is very low.
- Re: (Score:2)
  
  by rrohbeck ( 944847 ) writes:
  
  Except for FW bugs that may lock up the drive hard or cause it to say it has 8MB capacity.
hundreds of drives... (Score:2)

by spywhere ( 824072 ) writes:

...bought and installed in desktops & laptops over the last decade, and what I've learned is to buy Seagate drives. I have seen way fewer defects and first-year failures on Seagate than WD, and I was happy to see Maxtor go away.
- Re: (Score:2)
  
  by rrohbeck ( 944847 ) writes:
  
  Same thing here. We use hundreds of drives per week, mostly Seagate plus some Hitachi and recently qualified Toshiba. No WD unless you count HGST.
Disk Utility (Score:2)

by hackertourist ( 2202674 ) writes:

When installing a new disk in a Mac, I run Disk Utility with the Secure Erase option enabled. This will write 7 or 30 passes of 0000 to every block, that should find any early problems...
My testing methodology (Score:2)

by dpidcoe ( 2606549 ) writes:

I thoroughly test any new hdd I get for my desktop PC:
The first thing I do is format it and install windows. If that works, then we know the drive isn't DOA
From there I torture test it by copying several hundred gigabytes of software and movies, as well as installing some more programs.
After that, I let it run for a few months, using it normally. If it crashes during that time, then I know it was bad.
Because you ran linux? (Score:2)

by damn_registrars ( 1103043 ) writes:

Well, the last drive I returned to a manufacturer was one that I was running FreeBSD on and they didn't seem to care. Granted, the experience with the manufacturer (Seagate) was less-than-pleasant but that had nothing to do with my choice of OS which I don't think they ever asked.

I now buy only Western Digital.
Burn In Testing for New Gear (Score:2)

by hackus ( 159037 ) writes:

This is part of a process for testing new server gear.
Since I use Fedora, currently at 17, burn in testing is important.
Quick tip: Most of the distro's currently do not detect SSD drives during the install and do not include the "discard" keyword in the fstab entries for the device.
If you do use a Modern Distro, make sure that if you install or use a SSD with it, to mount the device with kernel flag for TRIM support set.
For example:
UUID=xxxxxxxxxxxxxxxxx /mnt/ssd2 ext4 discard,defaults
Plug it in (Score:3)

by mbone ( 558574 ) writes: on Sunday December 23, 2012 @03:18PM (#42376427)

Testing is simple - plug it in, and run it till it fails. Might as well use it in the mean-time.

Share
twitter facebook
Wrong Approach (Score:5, Insightful)

by nuckfuts ( 690967 ) writes: on Sunday December 23, 2012 @04:29PM (#42376841)

I've been dealing with hardware failures for 20+ years. What I've learned is that disasters WILL happen, regardless of what preventive measures are in place. So I shifted my focus toward recoverablity. To me, the important question is "When something catastrophic happens, how quickly and easily can I put things back in working order"?
Since I use RAID where appropriate, and more importantly, I am positively fanatic about frequent, full, and tested backups, the only concern I have when a hard drive dies is whether I'm still entitled to a warranty replacement.

Share
twitter facebook
Don't degauss it to start with (Score:2)

by andy the engineer ( 274207 ) writes:

On black Friday I bought a 1 TB drive at Office Depot, and of course they waved the box over their anti-theft degauser. I asked for a different drive and told them that they shouldn't do that with drives. The girl gave me the look we all have seen, but the boy behind her actually agreed with me and they gave me a drive out of the cage and let me leave the store with the alarm blaring. I've just about filled it up already and It's been working fine.
Heck yeah I test. (Score:2)

by MasterOfGoingFaster ( 922862 ) writes:

I test every single drive before deployment. I've found Gibson Research Corp (grc.com) Spinrite to be vital. It's pretty much the only drive test / repair / recover tool I use - other than RAID recovery tools. I'm astonished at the number of people who say they don't test at all.
Go visit a UPS or Fed-Ex distribution center and watch the "slapper" kick packages off the 45MPH belt onto a slide at the load dock. Small boxes like hard drive packages are airborne. I doesn't matter how much the factory tests.
Try to break the disk before you lose your data (Score:3)

by ncw ( 59013 ) writes: on Sunday December 23, 2012 @08:21PM (#42378221) Homepage

Stress testing hard disks is a particular bugbear of mine, after having some really bad luck with early hard disks. Over the 15 years that I've been doing it I've had to send back loads of hard disks and flash cards because they failed my tests, either breaking completely or returning single bit errors in your data. Mostly the manufacturers will take disks back if you can get their stupid Windows program to return an error code. Sometimes it takes a bit of arguing but ultimately the manufacturers want to keep you happy. Flash disks with single bit errors are the hardest to send back in my experience.
Here is the latest generation of my stress testing code (re-written in Go recently): https://github.com/ncw/stressdisk [github.com]
(Interestingly the stressdisk program sometimes finds bad ram in your computer too!)
I generally thrash every new hard disk or memory card for 24 hours to see if I can break it before trusting any data to it!
I also run a long smart test too.
Somewhat paranoid, yes, but I really, really hate losing data!

Share
twitter facebook
- Re:SSDs (Score:5, Insightful)
  
  by roc97007 ( 608802 ) writes: on Sunday December 23, 2012 @01:29PM (#42375799) Journal
  
  > Who cares about HDDs anymore these days?
  Anyone with a need for a massive amount of storage space.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by stuporglue ( 1167677 ) writes:
    
    I just bought a new ThinkPad which had several SSD options. I chose the slower 1 terabyte disk instead. I'd rather have everything I need with me, even if it is a little slower.
    As for backups, I have a daily cron job which rsyncs between my laptop and my home server.
    When I have massive changes I make sure I'm hooked up to the wired home network, otherwise it just goes on over wifi.
  - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
  - - Re:SSDs (Score:4, Insightful)
      
      by White Flame ( 1074973 ) writes: on Sunday December 23, 2012 @01:53PM (#42375947)
      
      Not really. People usually don't modify gigantic footprints of data per day, so standard incremental backup strategies are still very applicable. Most of the large data tends to be read-only over time, typically media, archives, large installation files, etc.
      
      Parent Share
      twitter facebook
    - Re:SSDs (Score:4, Insightful)
      
      by aaarrrgggh ( 9205 ) writes: on Sunday December 23, 2012 @02:29PM (#42376127)
      
      Rebuild time. It takes our hardware raids about 24 hours to rebuild, and software raids about 72 hours. If the disk failure isn't detected immediately, even with RAID-6 you are pushing your luck.
      RAID is not backup.
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
      - Re: (Score:2)
        
        by aaron552 ( 1621603 ) writes:
        
        Isn't this what hot spares are for?
      - Re:SSDs (Score:5, Funny)
        
        by drsmithy ( 35869 ) writes: <drsmithy@gmail . c om> on Sunday December 23, 2012 @05:23PM (#42377119)
        
        Holy crap. Twenty 3T spindles in a single array ? What do you do to de-stress ? Run between cars on a highway ?
        
        Parent Share
        twitter facebook
    - Re: (Score:3)
      
      by roc97007 ( 608802 ) writes:
      
      At two companies I managed IP libraries (massive amounts of photographs and drawings used in catalogs and advertisements). The data changes only slowly, and (depending on usage) seasonally, so incremental backups are very much practical. But that's not really the issue.
      This is important. Raid protects you from certain kinds of failures, usually limited to the mechanical or electrical failure of a single hard drive. (More protection can be had by nesting raid levels, but for most installations this is th
- Re:SSDs (Score:4, Informative)
  
  by cpghost ( 719344 ) writes: on Sunday December 23, 2012 @01:58PM (#42375977) Homepage
  
  Who cares about HDDs anymore these days?
  We do here at work. We need some modest 120+ TB of storage right now, and 30% of that content is highly dynamic (PostgreSQL databases). Anything but data center quality HDD would be silly, not to mention unreliable as hell and heavily expensive. SSDs are just for laptops or so, not for real data storage requirements.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by jmichaelg ( 148257 ) writes:
    
    > Anything but data center quality HDD would be silly, not to mention unreliable as hell and heavily expensive.
    Guess Google is silly then using the cheapest possible hard drives and accommodating the inevitable failures.
  - Re: (Score:2)
    
    by PlusFiveTroll ( 754249 ) writes:
    
    > SSDs are just for laptops or so, not for real data storage requirements
    Yep, just for laptops
    http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-910-series.html [intel.com]
    http://www.equallogic.com/products/default.aspx?id=10857 [equallogic.com]
    SSD isn't great for bulk data storage, but where you need high IOPS a few SSDs in arrays replace a truckload of drives.
  - Re: (Score:2)
    
    by war4peace ( 1628283 ) writes:
    
    Nope. SSDs are reliable enough to be used in server-grade implementations. The only issue with them is that they're highly specialized. If your regular HDDs become the bottleneck, you will need SSDs. Also, if you have some small implementations where you need fast access to read/write/modify data (some MMOs come to mind) and need to protect it against a power failure or RAM going awry, you should use SSDs.
    - Re:SSDs (Score:5, Interesting)
      
      by cpghost ( 719344 ) writes: on Sunday December 23, 2012 @03:14PM (#42376411) Homepage
      
      Actually, the only use for SSDs currently are ZILs (ZFS intent logs) and we're evaluating whether we put PostgreSQL transaction logs on an SSD, but that's another story. Our main storage farm is still HDD-based.
      
      Parent Share
      twitter facebook
    - Comment removed (Score:5, Interesting)
      
      by account_deleted ( 4530225 ) writes: on Sunday December 23, 2012 @04:18PM (#42376777)
      
      Comment removed based on user account deletion
      
      Parent Share
      twitter facebook
      - Re: (Score:3)
        
        by dinfinity ( 2300094 ) writes:
        
        Please. Quoting Jeff Atwood as an authoritative source on SSDs?
        Some anecdotal evidence and a subsequent admission of buying from the brand known for the highest failure rate in SSDs isn't going to convince anyone.
        I'd like to see some proper statistics before I believe anything you say.
        The most reliable statistics I've seen show SSDs performing as good or better than HDDs when it comes to failing. I haven't seen any statistics on what percentage of failing drives did so spontaneously, completely, without war
        
        Re: (Score:3)
        
        by war4peace ( 1628283 ) writes:
        
        Exactly this.
        I know a (very large) Data Center belonging to a (very large) company which started replacing their HDDs with SSDs. The price difference isn't even that large; price-per-GB for a server-grade 15K RPM SAS was negligibly close to SSD price. And the advantages are really there: (much) lower heat produced, less noise, less space taken, less energy consumed. Even with a similar failure rate, the advantages are there.
- - Re:SSDs (Score:4, Insightful)
    
    by PlusFiveTroll ( 754249 ) writes: on Sunday December 23, 2012 @02:50PM (#42376255) Homepage
    
    Depending on your definition of reliable and long term, people still use tapes.
    
    Parent Share
    twitter facebook
- - Re: (Score:3)
    
    by ArchieBunker ( 132337 ) writes:
    
    Sounds like a really old troll.
- Re: (Score:2, Funny)
  
  by Anonymous Coward writes:
  
  Let me guess,,, if it sank to the bottom it was a good drive, but if it floated it was a bad drive and needed to be burnt at the stake.
  - Re: (Score:2)
    
    by geminidomino ( 614729 ) writes:
    
    Of course! Fucking witches are getting into everything these days!
- Re: (Score:2)
  
  by nabsltd ( 1313397 ) writes:
  
  How did this make it to the front page, especially with SSD prices being what they are?
  I have a 20TB RAID array that cost me about $0.10/GB, including controllers. If you can afford to build a 20TB array using SSD, you have far more money than I do. You will also need more controllers than I do (port multipliers divide the bandwidth, which you don't want to do for SSDs), since you'd need at least 20 SSDs (if you were willing to pay about $2.50/GB), but more likely more than 45 (at about $0.85/GB).
  You also need special controllers that understand SSDs and can pass TRIM commands, and that wil
  - Re: (Score:2)
    
    by PlusFiveTroll ( 754249 ) writes:
    
    >So, yeah, for a boot drive, SSDs kick ass, but for storing your movie collection, not only are they 10 times more expensive than magnetic disks, but they are way overkill as far as performance is concerned.
    And where performance is concerned the raid of SSDs replaces many many more disks.
    Use a sledgehammer to drive railroad spikes
    Use a finishing hammer to drive finishing nails.
- Re: (Score:2)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
  - Re: (Score:2)
    
    by SuperQ ( 431 ) * writes:
    
    Yea, I would like to see a better communication method for these error to be communicated up from the kernel through userspace. Most of the time when a "normal" user gets errors for EIO, they see some kind of crash or debug message. If the filesystem could simply put the filename with the error into a list for some userspace service, the GUI file manager(s) or some health monitoring service could notify the end user with something a little more descriptive.
    This could also let the user activate the relocat
- Re: (Score:2)
  
  by rrohbeck ( 944847 ) writes:
  
  I thought that applies only to software.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Heh (Score:5, Insightful)

Re:Heh (Score:5, Insightful)

Re:Heh (Score:4, Informative)

Re: (Score:2)

Re:Heh (Score:5, Interesting)

Re: (Score:3)

Re: (Score:3)

Re:Heh (Score:4, Informative)

Re:Heh (Score:5, Informative)

Re:Heh (Score:4, Interesting)

Comment removed (Score:5, Interesting)

Re:Heh (Score:5, Interesting)

Re:Heh (Score:5, Informative)

Re: (Score:3)

Re:Heh (Score:4, Informative)

Re: (Score:2)

Re: (Score:2)

Comment removed (Score:4, Informative)

Re:Heh (Score:4, Informative)

Re:Heh (Score:5, Interesting)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:3)

dban followed by smartctl (Score:4, Interesting)

Re:dban followed by smartctl (Score:5, Interesting)

Re: (Score:2)

Used to never test (Score:3)

Re: (Score:2)

Re:Used to never test (Score:4, Interesting)

Anyone actually does this? (Score:2)

Re: (Score:2)

smartmontools (Score:5, Informative)

Re:smartmontools (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Exercise the drive (Score:2)

Re: (Score:2)

betteridge's law of headlines (Score:2)

Did ketchup lead to the extinction of dinosaurs? (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:3)

Lifetime of bathtubs (Score:3)

Yes! Especially before adding them to an array. (Score:5, Interesting)

Re: (Score:2)

Murphy's Law of Testing (Score:3)

Do you test third party software components? (Score:2)

Re: (Score:2)

Re: (Score:2)

format and secure erase (Score:2)

Re: (Score:2)

Badblocks/Shred (Score:2)

Re: (Score:2)

No - I just assume they will fail (Score:2)

Re: (Score:2)

SMART + badblocks (Score:5, Interesting)

Re: (Score:2)

Re: (Score:3)

Google Whitepaper Answers Your Questions (Score:2)

Re: (Score:2)

hundreds of drives... (Score:2)

Re: (Score:2)

Disk Utility (Score:2)

My testing methodology (Score:2)

Because you ran linux? (Score:2)

Burn In Testing for New Gear (Score:2)

Plug it in (Score:3)

Wrong Approach (Score:5, Insightful)

Don't degauss it to start with (Score:2)

Heck yeah I test. (Score:2)

Try to break the disk before you lose your data (Score:3)

Re:SSDs (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re:SSDs (Score:4, Insightful)

Re:SSDs (Score:4, Insightful)

Re: (Score:2)