Forgot your password?
typodupeerror
Data Storage IT

Why Mirroring Is Not a Backup Solution 711

Posted by kdawson
from the pointed-lesson dept.
Craig writes "Journalspace.com has fallen and can't get up. The post on their site describes how their entire database was overwritten through either some inconceivable OS or application bug, or more likely a malicious act. Regardless of how the data was lost, their undoing appears to have been that they treated drive mirroring as a backup and have now paid the ultimate price for not having point-in-time backups of the data that was their business." The site had been in business since 2002 and had an Alexa page rank of 106,881. Quantcast said they had 14,000 monthly visitors recently. No word on how many thousands of bloggers' entire output has evaporated.
This discussion has been archived. No new comments can be posted.

Why Mirroring Is Not a Backup Solution

Comments Filter:
  • by wandazulu (265281) on Friday January 02, 2009 @12:29PM (#26301333)

    Mirroring, RAID, grid, whatever. At some point, you want your data safe and secure on something not physically attached to any power source.

  • by MBCook (132727) <foobarsoft@foobarsoft.com> on Friday January 02, 2009 @12:31PM (#26301369) Homepage

    It's really unfortunate that this happened. If they had simply had a backup snapshot of the DB they could have restored it. RAID only saves you from disk failures. It doesn't work on OS/user failures.

    Unfortunately this is the kind of thing you tend to learn from experience (either yours or someone else). It's very easy to think "RAID 1 = disks are safe".

    Just like a database cluster wouldn't have saved them. A clustering database can save you from load, or you can swap servers if a disk goes bad. But when someone issues "DELETE * FROM..." the other cluster nodes start to happily run the same thing and now you have 2 (or 3 or 10 or...) empty database boxes.

    I hope those bloggers had a backup of some sort of their own.

  • by ergo98 (9391) on Friday January 02, 2009 @12:51PM (#26301691) Homepage Journal

    Unfortunately this is the kind of thing you tend to learn from experience

    Even a moment of thought would have made it abundantly clear that this was not a backup situation, and it certainly should not require loss to pound it into someone's head.

    They were clearly way over their heads if they thought this protected them from anything other than a single drive failure. More likely they were entirely aware of the risk, but decided to wing it anyways.

  • by Nom du Keyboard (633989) on Friday January 02, 2009 @01:11PM (#26302025)

    They also purposely blocked archive.org via a robots.txt exclusion, so the bloggers can't use that to try and recover some of their blogs.

    This is just compound foolishness. I gather they did it in an attempt to control bandwidth costs since it's hard to imagine any other reason.

  • by postbigbang (761081) on Friday January 02, 2009 @01:12PM (#26302051)

    Nope.

    Mirrors are fine, just snapshot them and store them offsite regularly. Do delta backups as needed but close-in for fast restoration.

    There is no rational justification for tape anymore, what with the cost per TB stored on hard disks now under $130, total $$. Random accessibility unless you're stalling a subpoena, is just mandatory on backup media.

  • by PearsSoap (1384741) on Friday January 02, 2009 @01:13PM (#26302057)
    Since when is Slashdot in the habit of disagreeing with Linus Torvalds?

    Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it ;)

  • Double Duh! (Score:5, Interesting)

    by Roger W Moore (538166) on Friday January 02, 2009 @01:22PM (#26302205) Journal
    Since they apparently used OSX Server this is particularly bad. All they needed was a large enough USB attached disk and then to turn on Time Machine. Might not be the best solution for their needs but it is hard to imagine one which requires less effort.
  • Re:Ouch (Score:3, Interesting)

    by slugstone (307678) on Friday January 02, 2009 @01:32PM (#26302377) Journal

    Working at several hosting places I would say,you are correct. Never trust a hosting service backup. I always told our customers to never trust our backup. Sometimes backups just never happened. They are not high on the list of things to keep working.

  • by LordSnooty (853791) on Friday January 02, 2009 @01:34PM (#26302407)
    I hope affected users are looking into this, I just did a search of a random JS blog and 2,000 entries were returned, all cached it would seem. So many people might be able to recover their work in a very painstaking manner.
  • by mzito (5482) on Friday January 02, 2009 @01:41PM (#26302523) Homepage

    Ah, it totally depends on the type of database cluster. For example, with Oracle, if you're using Oracle DataGuard, even in synchronous replication mode you can define an "apply delay" - basically, "Don't acknowledge this commit until it is written locally, and copied and acknowledged on the remote side, but don't actually apply the transaction for two hours"

    That way, if someone does a delete * from blogs;, it will be reflected immediately on the production, but you've got a nice window to sort it out.

    Plus, if you've got database flashback turned on, you can simply say, "Flash my database back to what it looked like before someone was an idiot", and all your data comes back.

    These features are expensive in Oracle, but they can be very useful when you actually need them.

  • Re:El Oh El (Score:2, Interesting)

    by Chris Pimlott (16212) on Friday January 02, 2009 @01:57PM (#26302793)

    Their post said that only the task-specific server for data was hosed. If Journalspace offered paid services, then their billing system should still have all their customer's details.

  • by nabsltd (1313397) on Friday January 02, 2009 @02:14PM (#26303025)

    The best way I have found to test the backup is to nuke the data and restore.

    Seriously, if you know what files store the data (and that you are backing up), just stop services and rename a directory or two so the data is "gone". Then, restore from backup, start the service, and see how things look. Another good way is to restore the data to a VM that runs the same software as the production server. You can sandbox a simulation of the entire Internet inside a few VMs if you want, and test what happens.

    I just did something similar when I upgraded the OS on a VM that runs a MySQL server:

    1. Create and configure new VM
    2. Stop services on old VM
    3. Run backup on old VM
    4. Stop old VM
    5. Reconfigure new VM with correct IP, etc., and restart
    6. Restore data to new VM from backup
    7. Test

    Basically, if things had gone poorly, I could just stop the new VM and revert back to the old one.

  • by Jon_E (148226) on Friday January 02, 2009 @02:37PM (#26303289)

    that's where trixter needs to zfs send/recv the snapshots to an offsite location (and probably roll snapshots more frequently)

  • OS X Server (Score:3, Interesting)

    by DTemp (1086779) on Friday January 02, 2009 @02:56PM (#26303469)

    The site was run on OS X Server... I think this may be indicative of the level of IT effort with the company. Look, *I* run an OS X Server... but *I* am a Biology major that knows approximately dick about the UNIX command line, and use it to run a server that I probably wouldn't be able to run any other way. I also have it backup nightly to a cheap NAS, archiving old backups, and I've tested a restore to make sure it works.

    This is probably just a couple guys who ran a website in their spare time... not a huge IT effort that failed.

  • Re:Just give up? (Score:3, Interesting)

    by imsabbel (611519) on Friday January 02, 2009 @03:10PM (#26303607)

    Thats bullshit, and has been for decades.
    Its a myth. Just learn about it. Even if we use our newest AFM, or XMCD microscopy, you wont see an overwritten byte in any drive of the last 5 years. And even the last decade is very doubtful (basically, since GMR drives are around).
    There IS NO SPACE between tracks anymore. Bits are right next to each other. If you overwrite, nothing above the superparamagnetic limit is left.

    Not even the NSA could get anything useful out of a single overwrite with zeros (well, except relocation sectors and other specialities that might compromise security, but doesnt help with a backup)

  • by troll8901 (1397145) <troll8901@gmail.com> on Friday January 02, 2009 @03:14PM (#26303657) Journal

    Even better if the drive goes home with the admin at night.

    Would the admin be tempted to look at other people's data?

  • by teg (97890) on Friday January 02, 2009 @03:17PM (#26303711) Homepage

    Never underestimate the beancounter's desire to save every cent possible.

    That's contrary to my experience. Other expenses have been skimped on occasionally, but just mention the word "backup" and the funding was there.

  • by bodland (522967) on Friday January 02, 2009 @03:24PM (#26303789) Homepage
    I'm just sayin'
  • Re:Double Duh! (Score:3, Interesting)

    by Feyr (449684) on Friday January 02, 2009 @05:05PM (#26304979) Journal

    did they ever fix their multithreading issues, where the performance went to shit as soon as you got more than 4 threads going at once?

    THAT is what puts me off os x server, not teh pretties

  • by dbIII (701233) on Friday January 02, 2009 @08:35PM (#26307565)
    I'm not talking in the abstract here - the tapes was on nine inch reels with the data in SEGD format and now the data is now on the system. Nice set of assumptions you have there, but I was using a real example.

    Disks are still crap for archival storage and for most of us they are still crap for daily offsite backups. It would be nice to have a second site and a really fast, cheap way to get terabytes of data to it - but for me the expense of that is a lot more than taking a box of tapes to a storage shed. Eventually those daily tapes become archival tapes when projects finish. To cap it all off, most of the incoming data where I work comes in on tape. DVDs are utter crap when you want to be sure a lot of data can actually be read at the endpoint and only hold 4.5GB anyway. Portable USB drives are replacing tapes for transport but they are just as slow to read and fragile to transport, and even then you get the time consuming crap of spending twenty minutes going around the building to find someone with the same sort of USB drive because it came without cables.

    The final point is I really do not want to have to maintain the systems to have a couple of hundred terabytes of storage when the working set is well under 5 terabytes and there would be a lot of repition in that few hundred terabytes of recovered tapes. It then also raises security problems to ensure client data is not easily accessable by other clients. Disks are not the best for archival storage. One of the reasons I was using tapes from 1982 is that nobody was interested in the area the data came from between that data and this year. Disk space is still too expensive to keep two and a bit copies of something big for more than twenty years :)

    Tapes are often annoying and I sopmetimes wish we could be rid of them but there are many circumstances where they are useful - paticularly archives, transport and backups. While I've been happily using rsync to take daily snapshots of important stuff for years and distributed it to machines in different buildings it still cannot always replace tape. I've had too many dead hard drives or unreadable DVDs (plus they are too small) to seriously consider them as archival storage. As for transport - why are USB hard drives so horribly slow? The ease of random access is irrelevant when you want a lot of files from the disk so LTO and SDLT still win there.

  • Re:Double Duh! (Score:3, Interesting)

    by tomknight (190939) on Friday January 02, 2009 @08:39PM (#26307623) Homepage Journal
    Here's what I've found being said on the topic:

    "I think I can reveal this much: an EX IT guy didn't do something he was supposed to have done. This wasn't discovered until AFTER the disks crashed. So, there were probably other reasons for this guy being an EX, too. Anyway, the crash happened, the mistake was discovered but now too late to fix."

    From: http://dorrie.de/F1/viewtopic.php?t=194&start=375&sid=a65b1d9b0fbcc3c3e8df874fc167d495 [dorrie.de]

panic: can't find /

Working...