Recovering a Wrecked RAID 175
Dr. Eggman writes "Tom's Hardware recently posted an article specifying how the professionals at Kroll Ontrack recover data from a RAID array that has suffered a hard drive failure, allowing for recovery of even RAID 5 arrays suffering two failures. The article is quick to warn this is costly, however, and points out the different types of hard drive failures that occur, only some of which are repairable. Ultimately the article concludes that consistent backups and other good practices are the best solution. Still, it provides an interesting look into the world of data after death."
Long-winded advertisement^H^H^H^H^H^H^H^H article. (Score:5, Insightful)
But a great battle cry! (Score:2)
Death before data!!!
FOR THE LAST FREAKIN' TIME... (Score:4, Insightful)
Re:FOR THE LAST FREAKIN' TIME... (Score:5, Informative)
When a disk fails in a RAID, it needs to be replaced IMMEDIATELY. A RAID system with a failed disk is a disaster waiting to happen. I've been in smaller shops that don't even have spare disks around. When a disk failed, they would order a disk at that point and have it shipped.
You should always have plenty of spare disks around, and you should replace disks as soon as they fail. A double disk failure is rare, but the longer you put off replacing a failed disk, the more likely it becomes.
Terminolgy update (Score:2, Funny)
Or in the case of RAID-1, it becomes just an ID.
Re: (Score:2)
you should replace disks as soon as they fail. A double disk failure is rare, but the longer you put off replacing a failed disk, the more likely it becomes.
That's what you have (cold or warm) spare drives for. The RAID can start rebuilding on the standby disk immediately if it has a spare.
Given the recent results showing that disk failure rate is higher than it was thought, that the bathtub curve doesn't apply, and that failures are correlated, this sounds more and more like you really have to have it.
Re: (Score:2)
Re: (Score:2, Insightful)
Re: FOR THE LAST FREAKIN' TIME... (Score:2)
Software RAID (Score:4, Insightful)
Re: (Score:2)
Backups are not a 'solution'. They are a 'backup solution' to the 'main solution'.
Of course one should keep backups, but I'm sick of it being called a solution to drive crashes.
I had a drive crash this morning (on a server that is fully backed up daily. And I've had to get another server started and serving DHCP and DNS simply because I needed the thing up and running FAST. The RAID system crashed. If a drive crashed and it was in a RAID system, the server will keep running. Now that's w
Re: (Score:2)
RAID is to keep the system running (except for that absurd RAID0 crap).
Backups are to mitigate data loss.
Now let's look at what the article was about. The title is "Recovering a Wrecked RAID". Why might you need to do this? To keep the system running? Not with what they're talking about. No, they're talking about recovering from a data loss where RAID is involved. Responding to this with, "Well, you should have kept backups.
Re: (Score:2)
Agreed (for home use), and ZFS's raidz is the easiest. ;-)
Unfortunately, Solaris's IDE controller support sux. :-( If they only supported PCI-based IDE controllers, it would be soooo easy to create and maintain a RAID array using old hardware.
Re: (Score:2)
I think software RAIDs are better than hardware RAIDs (for home use) due to their flexibility. You can mix different disk interfaces (IDE, SATA, SCSI,
Re: (Score:2)
Unless you were thinking you can't determine that it is
Gotta love Tom's articles (Score:5, Insightful)
They painstakingly
NEXT PAGE
pull data
NEXT PAGE
off the
NEXT PAGE
damaged drive
Printer Friendly (Score:4, Insightful)
I don't know why TH has printer friendly pages that they don't ever link to.
Re: (Score:2)
Re: (Score:2)
IntelliTXT too (Score:4, Insightful)
Re: (Score:3, Insightful)
Re: (Score:2)
Re: (Score:2)
I won't say whether or not that's giving THW too much credit or not.
Re: (Score:3, Informative)
Re: (Score:2)
I never understood the "next page" obsession that various websites have. I assume it's a way to fit more advertising in a given article, but why not, instead of splitting articles over multiple pages, simply insert more advertising on a single page? Are publishers afraid multiple ads will not load immediately? Surely loading an entire new page is worse than one more flash box? Do contracts require a given ad to have its own page? I'm curious.
Re: (Score:2)
Let's see...10 ads per page spread out over 10 pages? Or 50 ad per page spread out over 2 pages? Both are very annoying, but if I can't even find the article text in a sea of ads, I'll never visit the site again. Then again, Tom's crossed that threshold for me a long time ago...
Re: (Score:2)
Re: (Score:2)
(This isn't directed at you, just poor design) Readers may not like to scroll, but they hate waiting for page loads even more. At least that's what Edward Tufte implied at a conference. Also see discussion at Tufte's website [edwardtufte.com]. Some interesting points raised. I suspect that the real issue is the page loading speed, rather than the action required to get there. Once loaded, scrolling is instantaneous. Paging could be, but would require different formats (e.g. PDF) or cleverer browsers (I thought I heard
Re: (Score:2)
Which is to have to sell stuff at such a high markup noone buys them. HA HA!
Re: (Score:2)
This should be obvious, but if they annoy the readers too much, they won't be making any money.
Generally I just smack a few of the Guild around (Score:2)
Could have mentioned other options (Score:2, Informative)
Gibson the Hack (Score:4, Insightful)
Re: (Score:2)
Re: (Score:2)
Spinrite isn't bad, per se. It's just not in any way revolutionary or important. There are many better tools out there for doing low level copies.
Re: (Score:2)
Be sure to use the ddrescue version that works with a logfile, so that you don't have to write down the blocks that you could rescue.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Care to define magnetodynamics, then?
Admittedly I've only studied magnetostatics so far, but then I'm not very far through my degree. Since magnetostatics was defined as the study of static magnetic fields, I don't think I'd be making a particularly large logical leap to infer that magnetodynamics is the study of non-static magnetic fields. Since, according to Wikipedia, magnetostatics is a good approximation for all but rapidly alternating magnetic fields, I could even make a larger leap and infer that magnetostatics is generally used to s
Re: (Score:2)
Re: (Score:2)
I'm trolling, ignore me. (Score:2)
Re: (Score:2, Interesting)
and to use dd to take an image of the disk or ghost (but iirc ghost uses dd) ?
i have been able to successfully recover 99% of a crashed, broken, badly partitioned hard drive that way numerous of times
offcorse i do not claim i have the expertise as ontrack but seeing as i've done this for quite
a few friends and since well not everybody can pay what they ask for their service, i can understand
why they get drives that have
Re: (Score:2)
Questionable advice from Tom's (Score:5, Interesting)
We assume that all hard drives will be handled with care, so they should be installed in suitable drive bays. If you use multiple drives, we recommend removable drive frame solutions, which help reduce vibration transfer onto the computer chassis and even back to individual hard drives. Make sure that your system has sufficient ventilation, so high speed hard drives won't overheat.
I've found that the removable drive frames available for cheap consumer hardware to be total crap. The metal enclosure keeps heat close to the drive, and the tiny fans used don't move nearly as much air past the drive as when it's inside the case, being cooled by the airflow of the case fans. The drive temperature is therefore higher even under the best conditions. In addition, the smaller fans fill with gunk quickly and as a result wear out faster than larger ones, leading regularly to a drive trapped in an uncooled box.
I've used enclosures from Promise, Enermax, and several other companies whose products were so bad I tried to forget their names; all had fans that instantly became the least reliable part of the entire system once I installed the drive frame, and I wasn't happy with the drive's temperature from day one.
I don't think the person making this comment at Tom's ever keeps systems running long enough to realize the long-term issues that come with anything cheaper than server-grade drive enclosures for hard drives. I'd welcome suggestions for a better quality product in this category. It's a hard subject to cover, because by the time you've had several units setup for a year or two to gather useful data on how rugged they are, the product is obsolete; not something any review site I'm aware of is setup to cover.
And the alternative is? (Score:2)
Apple X-serve RAID? Cheapest - does it work reliably with Linux or Solaris? Word in the street is that it does, but I have not seen a demo yet.
We're actually going with recycling our ancient D-1000s and A-1000s with no-name 300 GB SCSI drives. Pretty old school, but reliable.
Re: (Score:2)
1) While separate, the power supplies in external drives aren't likely to be even as good as the one in your PC.
2) A surge sufficient to blow out things inside your PC could easily make its way out eSATA, USB, Firewire, etc, so I don't how much the argument of an external power supply saving you appl
Re: (Score:2)
Is there such a thing as... (Score:2)
After all, we're supposed to replicate data 3 times, right?
Whatever you want, linux md RAID-1 can do it (Score:2)
Linux md RAID-1 allows you to replicate to n number of drives, PLUS set m more drives as spares that will be automatically substituted for failed drives without intervention. You can spread the drives among as many controllers as you want.
Of course you need off site backups too (fire, theft, lightning, human error).
Re: (Score:2)
The only downside would be that the hot-spare drive gets used and may suffer from wear-and-tear more then a hot-spare drive that spends its time powered down.
On the flip side... you don't have to sit and wonder if that hot-spare drive
Re: (Score:2)
With SVM, create three stripes, create a mirror with one stripe, newfs it, when the newfs is complete, add a stripe to the mirror, (and then add another stripe to the mirror).
The stuff in the brackets is the only difference between creating RAID 1+0. Oh, and you don't want to use the GUI...
FWIW, my standard rollout these days is software-triple-mirrored hardware-RAID5 enclosures with an independa
Re: (Score:2)
Backing up HDDs is very hard (Score:3, Interesting)
Optical discs are a joke - 4.3GB is just not enough. Larger formats exist but are relatively expensive. Tape is expensive per MB and slow, plus it isn't random access and not suited to anything but slow full backups. MO is too small and expensive.
It seems like the best bet is something like a Century Tower - basically a USB enclosure that can take up to 4/8 drives. Keep it totally disconnected when not in use, and use RAID 0 mirroring with drives from different manufacturers.
Re:Backing up HDDs is very hard (Score:4, Informative)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
mt -f
tar cvbf 512
Every now and then you could do an image backup with
dd if=/dev/hda of=/dev/nst0 bs=64k
so you're able to restore your drive quickly. Works fine even on a live filesystem if it's the journaling type.
And if you're on Windows, install cygwin first, or boot Knoppix for an image backup.
Oh, of course you'll need a tape drive. Yes, you can do the same with optical disks but I don't trus
Re: (Score:2)
If I was going to only need 40GB tapes, I wouldn't bother. I'd use a USB HDD. Much easier to work with. It's not like I need to archive my backups for years, I just need one that works when my main HDD breaks.
I was talking about a system with 1.2TB of data. That would need over 30 tapes to back up, not even counting incrementals. Not DVD rips either, renders and associated data which cannot be easily replaced except by d
Re: (Score:2)
you shouldn't rely on that. You are not reading it atomically. Data can change while you are reading the drive. The image you end up with could be in an inconsistent state. If you take the image and write it back to disk, I would expect the file system driver to replay the journal on the first mount and mark the file system clean. But just because the file system driver flips a flag to say the file system is good doesn't mean it has fixed the
raid data recovery (Score:2)
I had an LVM container that sat on a RAID-1 volume go bad.
the lvm tools couldn't reconstruct the container, so I effectively 'lost' my partitions.
There wasn't any program I could find which would scan the raid volume for the data partitions,
so I ended up cobbling one together on my own, out of the sources in the ext2-tools distro.
And yes, I did get my data back, and no, i'm no longer using LVM containers.
Cheap Solution (Score:2, Informative)
Re: (Score:2)
Re: (Score:2)
backups (Score:3)
I miss the old bigfoot drives we had, everyone said they had problems with them but it was always (in our case) the board that died NOT the disk. I saved a couple of those by swapping in a board for a 1 hour recovery.
If you buy several HD for RAID or whatever buy one more and stick it on shelf for a rainy day. Along with a few utilities you can do 3/4's of what they do for $100 instead of $1000+
Lunch (Score:4, Interesting)
I attended a small conference where the Kroll VP of Data Recovery was speaking. He came in, his assistant set up his power point stuff, made sure the projector was right etc. He then gave a very interesting talk about what Kroll could pull off of a drive, despite what had been done to it. By way of example he showed a slide of a burnt and bent hard drive - that came out of the sky when the shuttle broke up. They recovered 99% of the data on that drive. He also mentioned that they do the data recovery for all of the spook organizations in D.C.
When we broke for lunch I got to sit at his table and we got to ask him all sorts of questions about their processes. He mentioned they have things they use that they have never patented because it would be too much of a leg up for both the competition and those that seek to destroy data. We tried to get him to tell us what we would have to do to a drive to make it unreadable. Mostly his answers to our "Surely this would make the data unreadable" queries were "You would think that would work wouldn't you?" Someone referenced his assistant who was sitting next to him and the VP said:
"Him? No, no, no. (laughs) He is not my assistant, in fact he doesn't work for me at all. He is a lawyer for the company and is here to make sure I don't say anything I am not supposed to." The assistant then gave us one of those 'I could eat you alive' lawyer smiles.
I walked out secure in the knowledge that short of melting the platters down the data can *always* be recovered.
Sera
Re: (Score:2)
I've got those thermite packs against the drives in my server for a reason damn it!
Actually, I suppose if I was really paranoid I could use the welding torch in the garage to melt the drives down, but I don't think I'd get as much for them on eBay...
Re: (Score:3)
Encrypt. I guarantee that even if the NSA can break AES, they won't do it for anything short of top secret cases that will never see the light of day. Breaking random drives encrypted with AES or any other modern cipher would disclose their ability to break that cipher and no one would use it anymore, removing their advantage.
Truly astonishing...but so simple (Score:2, Informative)
Data recovery is never an easy process (Score:3, Interesting)
Popped in a Helix disk, and checked what the MFT was doing. Low and behold, no MFT, no boot sector, and a huge list of bad sectors. Basically, the crash had resulted in a bad sector in the bad sector table, and all over the first portion of the disk.
These were 200GB disks, but eventually I was able to get a sector repair program to read through and do a non-destructive repair. Data was safe, but was now corrupt. Next step was to repair the data, and I was finally able to just use chdisk to repair.
Eventually, it was back to real data, and was able to push the data over to a new replacement hard drive.
Told the client to invest in RAID 1, but seriously doubt they would be willing to spend that $100 for the RAID. Instead, they prefer to pay $1000 for a repair.
BACKUPS. make lots of BACKUPS. RAID your stuff, and get those backups offsite. Do them regularly. Seriously, it would save your ass if something happens. For example, I have a LAN HD that is parked out in a shed in my backyard. Total cost $200, and has already saved my ass 2x.
RAID5 is good, not flawless (Score:3, Informative)
Data integrity and uptime are served by RAID5. If it's not good enough, then it should be backed with mirroring (RAID5+0) or some form of dual-parity RAID (RAID-DP from NetApp, etc.).
But data gets lost or corrupted, even without disk failures. Backups are the place where data recovery is done. DO YOUR BACKUPS!
What about ZFS? (Score:2)
http://docs.sun.com/app/docs/doc/819-5461/6n7ht6qt 0?a=view [sun.com]
For example:
RAID is not backup! (Score:2)
-b.
Re:RAID5. (Score:4, Informative)
Re:RAID5. (Score:5, Insightful)
My home setup is a pair of 300 gig drives in a mirror, with another 1.6TB for other storage. Stuff that is important is on the mirror, and is differentially backed up to DVD regularly.
Stuff on the mass array is available in original form (my DVD and CD library that's been ripped) or is backed up whenever it changes, which is not often (my code library, for example). Active code and my wife's thesis are on the raid. Supporting documents for the thesis are on DVD and mass storage, as is old code projects that I may borrow from for functionality in a new project. The old project (and likely several versions of it) are off on DVD in a safe deposit box, with the rest of my backups.
Safe deposit boxes are awesome. I have one that can store 600 cds in cake boxes and it only costs $120/year. Dirt cheap for climate controlled fireproof storage.
-nB
Re: (Score:2)
I think a 4 disk RAID-5 is a better sweet spot, since you lose less storage to parity/recovery data and it seems cheaper to buy 4x smaller drives than 2x larger drives. 2 300GB disks looks around $200 and 4x 120GB disks is around $220, but you end up with 60GB more space for the $20.
Re: (Score:2, Informative)
Re: (Score:2)
Plus the controller is cheaper, saving the 60 gig worth of money.
-nB
Re: (Score:2)
Re:RAID5. (Score:4, Informative)
Re: (Score:2)
The issue is that there are 2 writes that must be done, even in parallel, is not as fast as reading from multiple platters. You don't get to cut the write in half and use the speed of the spinning platters to your advantage. In a RAID 1 Read, you can read partial data from either or both drives.
Re: (Score:2)
It might be possible to read partial data from each mirror, but I've never seen a RAID controller do that. This one queues each read to the least busy mirror. Otherwise it has to process two SCSI commands instead of one.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re:RAID5. (Score:5, Insightful)
Re: (Score:2)
Some file storage doesn't need to be super fast but you need a lot of it and you need reliability. Hence, RAID 50. You'll learn that the first time you have a budget on a hardware deployment.
The easy way to fix a wrecked raid (Score:2, Funny)
Re: (Score:2)
In a Raid 50, it cannot.
Re: (Score:2)
Re:RAID5. (Score:5, Insightful)
To put it another way: What do you think your chances are of having a second drive failure in the few hours it takes you to replace the drive and rebuild it? Even if that does happen you just lose the data up until your last backup (a day at most).
Most professional installations I do are RAID1_0, because people are building the RAID array for the performance, not the cost. Since you're using crappy 80GB HDDs, I'm guessing you're going for cost, which makes it strange that you're thinking about a RAID6_0 solution at all (the controller alone won't be cheap for that). If you work the odds I think you'll find that it's just not worth it to build a RAID6_0, especially given the write penalty and complexity (complexity is your enemy with this, complexity means bugs, which can undermine your entire effort).
sorry, that's wrong... (Score:2, Informative)
http://storagemojo.com/?p=383 [storagemojo.com]
Short synopsis for those who don't want to read it: The rebuild process is intense enough to cause secondary failures in many more cases than you'd think. Because you haven't seen it yet is not indicative of the overall population, and sysadmins are payed to be prepared.
The rest of your post is arguable, but it's more a matter of opinion and practice than anything else.
Re: (Score:2)
I've never seen the second drive die during a rebuild (although this is typically RAID1_0, not RAID5), but that may be more a factor of the drives
Re: (Score:2)
In other words, have backups.
Re: (Score:2)
http://www.usenix.org/events/fast07/tech/schroede
The chances of a double disk failure in a RAID 5 are significantly greater than we think. A friend of mine had a double failure just last week.
And the operative words being regular backups (Score:2)
But pretty much agree with everything else you've said... Though a hot spare is always nice.
Re: (Score:2)
Re: (Score:3, Interesting)
The other 2T
Re: (Score:3, Informative)
If you use a reputable controller (i.e. one that costs more than your entire motherboard), it will read the configuration off the disks instead of overwriting them.