Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Data Storage IT

Why Mirroring Is Not a Backup Solution 711

Craig writes "Journalspace.com has fallen and can't get up. The post on their site describes how their entire database was overwritten through either some inconceivable OS or application bug, or more likely a malicious act. Regardless of how the data was lost, their undoing appears to have been that they treated drive mirroring as a backup and have now paid the ultimate price for not having point-in-time backups of the data that was their business." The site had been in business since 2002 and had an Alexa page rank of 106,881. Quantcast said they had 14,000 monthly visitors recently. No word on how many thousands of bloggers' entire output has evaporated.
This discussion has been archived. No new comments can be posted.

Why Mirroring Is Not a Backup Solution

Comments Filter:
  • DUH! (Score:5, Insightful)

    by Anonymous Coward on Friday January 02, 2009 @12:27PM (#26301311)

    DUH!

    • Re:DUH! (Score:5, Funny)

      by djupedal ( 584558 ) on Friday January 02, 2009 @01:07PM (#26301965)
      As if millions of voices suddenly cried out in terror, and were suddenly silenced.
    • Double Duh! (Score:5, Interesting)

      by Roger W Moore ( 538166 ) on Friday January 02, 2009 @01:22PM (#26302205) Journal
      Since they apparently used OSX Server this is particularly bad. All they needed was a large enough USB attached disk and then to turn on Time Machine. Might not be the best solution for their needs but it is hard to imagine one which requires less effort.
      • Re:Double Duh! (Score:4, Insightful)

        by azav ( 469988 ) on Friday January 02, 2009 @01:38PM (#26302461) Homepage Journal

        Or attach a 4 TB Drobo to it and then use Time Machine.

        Then make a backup and test the restore.

        Their admin is criminally incompetent.

      • Re: (Score:3, Informative)

        by CarpetShark ( 865376 )

        All they needed was a large enough USB attached disk

        Correction: all they needed was a large enough, functional, external disk.

        Finding functional external drive products isn't so easy, I've discovered.

      • Re:Double Duh! (Score:5, Informative)

        by MarkRose ( 820682 ) on Friday January 02, 2009 @02:22PM (#26303129) Homepage

        Not quite. Backing up a live database can be a bit tricky. By the time you finish copying part of the database, the first bit can change again. So you have to create a snapshot of some kind. And that has to be supported in the database setup (at the application or server level) in order for the backup to be in a consistent state. And you don't want your backup process to degrade site performance, either. So a simple file copy is totally inadequate.

        A common solution is replication. Backup is then performed by creating a replication point on the slave database machine then taking a snapshot and copying that while while master database machine continues serving at full speed. Replication can then catch up when the backup is complete. Another advantage to having replication is duplication on the machine level -- if the master fails, go live to the slave with minimal to no downtime. Set both machines up in a master-master configuration and you can swap back and forth as needed, allowing live maintenance and backup with no performance degredation.

        • Re: (Score:3, Informative)

          Except if somebody issued a drop table... then the repliants get dropped to... nice faithful mirroring!!! Been there, done that.

          A good solution is to use mirroring like this, and then take the replicant offline to do real, full backups without taking down the production box. Then you have a live copy if drives or processors go bad to bring up immediately, and a backup tape to cover boo boos like this one. I believe that's what parent is getting at.

        • by DamnStupidElf ( 649844 ) <Fingolfin@linuxmail.org> on Friday January 02, 2009 @05:07PM (#26304999)

          ACID compliant databases use a log, much like a filesystem journal, that contains all the changes made to the database before those changes were actually written out to the main database storage. When you back up the raw database, you back up all the logs since at least the time you started backup up the raw files until the time the backup was finished, and when you need to restore the database you put the raw data back and then let the database replay the logs.

    • Re:DUH! (Score:5, Funny)

      by NickFitz ( 5849 ) <slashdot AT nickfitz DOT co DOT uk> on Friday January 02, 2009 @03:51PM (#26304083) Homepage

      What about archive.org?

      Ah, apparently not... [archive.org] :-D

  • by Anonymous Coward on Friday January 02, 2009 @12:27PM (#26301313)

    While this mirrors previous comments, it's not really a backup solution.

  • by wandazulu ( 265281 ) on Friday January 02, 2009 @12:29PM (#26301333)

    Mirroring, RAID, grid, whatever. At some point, you want your data safe and secure on something not physically attached to any power source.

    • by Anonymous Coward on Friday January 02, 2009 @12:37PM (#26301465)
      Incremental backups to tape every night, full backup at the weekend. Tapes must be stored off-site at a proper storage location. Got lots of data and a small backup window? Get a faster tape drive and a tape robot. It costs money, but you data costs more.

      This is at a minimum people. Come on!
      • Re: (Score:3, Informative)

        by z_gringo ( 452163 )
        nightly dumps of the database and rsync of the data directories to servers in different locations should be adequate. If you have lots of data, I don't see how tapes are really going to do the daily backup jobs.

        backing up nightly to a large mirrored NAS and a periodic copy to a removable device seems like a good way to go these days. I haven't used tapes for years.
      • by postbigbang ( 761081 ) on Friday January 02, 2009 @01:12PM (#26302051)

        Nope.

        Mirrors are fine, just snapshot them and store them offsite regularly. Do delta backups as needed but close-in for fast restoration.

        There is no rational justification for tape anymore, what with the cost per TB stored on hard disks now under $130, total $$. Random accessibility unless you're stalling a subpoena, is just mandatory on backup media.

        • by Wdomburg ( 141264 ) on Friday January 02, 2009 @01:52PM (#26302705)

          Even accepting your price that's a cost of about 12.7 cents per gigabyte and you can get 800GB native LTO-4 tapes for about $50, which comes out to about 6.3 cents per gigabyte.

          But quoting costs for desktop grade SATA drives severely understates the true cost. For any non-trivial site installation you're talking near-line rated drives, drive caddies, storage shelves and additional SAN fabric. Then price out the additional power, cooling and rack space. Then price offsite shipping and storage for the bulkier, heavier and more delicate disk option.

          Mirroring has its place. Snapshotting has its place. And backups to stable media still has its place too.

    • by uncledrax ( 112438 ) on Friday January 02, 2009 @12:41PM (#26301521) Homepage

      It's more an issue that some people think that HA == DR.. which obviously this story reminds us that it is not the same thing.

      Mirroring / RAID == HA.. if one of your HDDs let the smoke out, you still don't incur downtime. If you have a hot-spare, you're even better.. all it does it let you have alittle time to correct the
      issue (ie: "It can wait until morning").

      Also, one other very important thing.. mirroring doesn't prevent/restore data corruption. If you're mirroring your rm -rf (as pointed out by Corsec67 below), your RAID will happy do what it does.. and span your command to all your disks.... Congrats, you just successfully gave yourself HA to your disk erasing! :]

      Backups are DR.. If your RAID croaks.. your SOL if you don't off-machine backups. If you accidently nuke your disks with an rm or something, you can still go back and restore data.. sure you'll likely loose -some- data, but -some- is better then all in this case.

  • by yttrstein ( 891553 ) on Friday January 02, 2009 @12:29PM (#26301337) Homepage
    And that's why your IT department actually needs funding. Sleep tight.
  • rm -rf / (Score:5, Informative)

    by corsec67 ( 627446 ) on Friday January 02, 2009 @12:29PM (#26301341) Homepage Journal

    rm -rf /

    That is one reason why mirroring isn't a backup, and why backups should ideally be off-line.

    • Re:rm -rf / (Score:5, Funny)

      by Piranhaa ( 672441 ) on Friday January 02, 2009 @01:06PM (#26301941)

      C:\>rm -rf /
      'rm' is not recognized as an internal or external command,
      operable program or batch file.

      Everything's still running here...

      • Re: (Score:3, Funny)

        by dfdashh ( 1060546 )
        Judging by your OS, not for long
  • Ouch (Score:4, Informative)

    by scubamage ( 727538 ) on Friday January 02, 2009 @12:30PM (#26301357)
    We do data hosting, and I can't imagine how catastrophic that would be. Jebus. Let this be an ultimate example of why numerous backups are needed. Always. Without question.
    • Re:Ouch (Score:5, Insightful)

      by conureman ( 748753 ) on Friday January 02, 2009 @12:45PM (#26301577)

      Or even one, stale, backup.

    • Re:Ouch (Score:4, Insightful)

      by jabithew ( 1340853 ) on Friday January 02, 2009 @01:02PM (#26301879)

      This story put the fear of god into me. The first thing I did since reading it is to back up the website I admin (for my dad) locally. I'd always assumed our host would have good backup, but that seems naÃve now.

      • Re: (Score:3, Interesting)

        by slugstone ( 307678 )

        Working at several hosting places I would say,you are correct. Never trust a hosting service backup. I always told our customers to never trust our backup. Sometimes backups just never happened. They are not high on the list of things to keep working.

    • by blowdart ( 31458 ) on Friday January 02, 2009 @01:11PM (#26302031) Homepage
      You don't just need backups. You need to TEST them. Having a backup run every night is nice and all; but if the tapes are unreadable and no error was reported, or if you're doing it wrong and the backup is corrupted and you only find out when you come to restore ....
      • by mortonda ( 5175 ) on Friday January 02, 2009 @02:17PM (#26303065)

        Backups must be:

        1) Automated - if you need human intervention, it will fail

        2) Point-in-time - the system must be able to provide restores for a set of times, as fitting for the turn around on your data. A good default is: daily backups for a week, weekly for a month, and monthly for a year

        3) TESTED: You must fully test the restoration process (if this can be automated, even better). Backups that you can't restore from a bare machine are worthless.

        For better disaster recovery, backups should be:

        4) offsite - if a fire or tornado hits, is the backup somewhere else?

        5) easily accessible - how long will it take to get the restore going?

  • Excellent! (Score:5, Funny)

    by GravityStar ( 1209738 ) on Friday January 02, 2009 @12:30PM (#26301361)
    Excellent! We can use their demise as yet another cautionary tale.
  • by zaibazu ( 976612 ) on Friday January 02, 2009 @12:31PM (#26301367)
    It is an inexpensive protection against a total harddisc failure, but effective at this part. A software going rogue or a user deleting the wrong files can't be helped by it.
  • by MBCook ( 132727 ) <foobarsoft@foobarsoft.com> on Friday January 02, 2009 @12:31PM (#26301369) Homepage

    It's really unfortunate that this happened. If they had simply had a backup snapshot of the DB they could have restored it. RAID only saves you from disk failures. It doesn't work on OS/user failures.

    Unfortunately this is the kind of thing you tend to learn from experience (either yours or someone else). It's very easy to think "RAID 1 = disks are safe".

    Just like a database cluster wouldn't have saved them. A clustering database can save you from load, or you can swap servers if a disk goes bad. But when someone issues "DELETE * FROM..." the other cluster nodes start to happily run the same thing and now you have 2 (or 3 or 10 or...) empty database boxes.

    I hope those bloggers had a backup of some sort of their own.

    • by mzito ( 5482 ) on Friday January 02, 2009 @01:41PM (#26302523) Homepage

      Ah, it totally depends on the type of database cluster. For example, with Oracle, if you're using Oracle DataGuard, even in synchronous replication mode you can define an "apply delay" - basically, "Don't acknowledge this commit until it is written locally, and copied and acknowledged on the remote side, but don't actually apply the transaction for two hours"

      That way, if someone does a delete * from blogs;, it will be reflected immediately on the production, but you've got a nice window to sort it out.

      Plus, if you've got database flashback turned on, you can simply say, "Flash my database back to what it looked like before someone was an idiot", and all your data comes back.

      These features are expensive in Oracle, but they can be very useful when you actually need them.

  • El Oh El (Score:4, Insightful)

    by greymond ( 539980 ) on Friday January 02, 2009 @12:33PM (#26301393) Homepage Journal

    That's all I can say at this. I'm really surprised that with all the users they had, they are so quick to say "everything is gone and we're giving up" instead of just starting over and maybe implementing protocol that would make sure this doesn't happen again.

    • Re:El Oh El (Score:5, Insightful)

      by kurtmckee ( 870398 ) on Friday January 02, 2009 @12:45PM (#26301575) Homepage

      I'm really surprised that with all the users they had, they are so quick to say "everything is gone and we're giving up"

      Considering how complete and unrecoverable the loss is, they have no idea who their users are. The accounts would have to be recreated from scratch, but who would try? Their users have no reason to ever trust them again. Journalspace would have a difficult time wooing back their original users, and no new user would seriously consider using them.

      Bowing out is the only recourse, but I'm glad they're considering releasing their source code.

    • Re:El Oh El (Score:4, Funny)

      by spuke4000 ( 587845 ) on Friday January 02, 2009 @01:06PM (#26301933)
      Indeed. Everyone knows that when you drive your company into the ground through incompetence you don't give up! You go to Washington to get your bail out. That's the American way.
  • Thank you (Score:3, Funny)

    by ari_j ( 90255 ) on Friday January 02, 2009 @12:36PM (#26301445)
    This is fascinating and altogether newsworthy. I had never before thought of this. I am very pleased, indeed, that kdawson engaged his most finely-honed editorial faculties to post this article to the front page, as it is not only stunning and fascinating in substance but also rather eloquently written.
  • by LSD-OBS ( 183415 ) on Friday January 02, 2009 @12:38PM (#26301477)

    I do not think it means what you think it means.

  • by computersareevil ( 244846 ) on Friday January 02, 2009 @12:39PM (#26301497)

    Mirroring: High availability
    Backups: High reliability

  • The rules of backups (Score:5, Informative)

    by Anonymous Coward on Friday January 02, 2009 @12:40PM (#26301511)

    The rules of backups:

    1. Backup all your data
    2. Backup frequently
    3. Take some backups off-site
    4. Keep some old backups
    5. Test your backups
    6. Secure your backups
    7. Perform integrity checking

    • Re: (Score:3, Insightful)

      by Anonymous Coward

      1. Backup all your data
      2. Test your backups
      3. Backup frequently
      4. Test your backups
      5. Take some backups off-site
      6. Test your backups
      7. Keep some old backups
      8. Test your backups
      9. Secure your backups
      10. Test your backups
      11. Perform integrity checking
      10. Test your backups

      Every company I've worked at has had a backup plan. Exactly zero have had a recovery plan.

  • Only 2 drives? (Score:4, Insightful)

    by lalena ( 1221394 ) on Friday January 02, 2009 @12:42PM (#26301527) Homepage
    Maybe I could understand that there might be issues with backing up live databases, and they didn't want to deal with it. Still not an excuse.
    BUT, according to the site "the server which held the journalspace data had two large drives in a RAID configuration". Only TWO drives.
    All they had to do was pull one of the drives, replace it, and lock up the original off site. In a couple of hours the drives would have been mirrored again.
  • by squeegee_boy ( 319210 ) on Friday January 02, 2009 @12:42PM (#26301533)
    Important note: don't hire the IT dude with Journalspace.com on his resume.
  • by gzipped_tar ( 1151931 ) on Friday January 02, 2009 @12:46PM (#26301595) Journal

    No doubt this incident is the result of the admin's fault. He's been confusing mirroring and backup and carried on the mistake until it's too late, as pointed out in other comments.

    Now what about a user's angle? The morale is you can never think your data is safer when it's "in the cloud". If you value your blog and your readers, you *should* save a copy of your work as well as the readers' info, *locally*, somewhere you have control over.

    There's no place like $HOME.

  • by computersareevil ( 244846 ) on Friday January 02, 2009 @12:52PM (#26301709)

    They also purposely blocked archive.org via a robots.txt exclusion, so the bloggers can't use that to try and recover some of their blogs.

  • by hwyhobo ( 1420503 ) on Friday January 02, 2009 @12:52PM (#26301715)

    In today's world where primary storage and protection storage are well-defined, and where entire industry grew around it (examples: NetApp, Data Domain), one is hard-pressed to understand the reason for such a debacle. The reading of the note referred to in the article [journalspace.com] leads me to believe, unfortunately, that Journalspace's IT department did not understand the difference.

    It is sometimes considered a bad form to say something bad about fellow techies. We prefer to look for 'outside' causes. Still, to learn and avoid the same problems in the future, one has to admit his mistakes first. This paragraph from the Journalspace's page:

    The value of such a setup is that if one drive fails, the server keeps running, using the remaining drive. Since the remaining drive has a copy of the data on the other drive, the data is intact. The administrator simply replaces the drive that's gone bad, and the server is back to operating with two redundant drives.

    makes me believe there is a denial going on.

    • Re: (Score:3, Insightful)

      by Lost Race ( 681080 )

      It is sometimes considered a bad form to say something bad about fellow techies.

      Yeah, right. If there's anything professionals love to do, it's talk trash about their peers. What's the first thing a computer guy says when you bring him in to fix a broken system? "My god, what idiot spec'd/built/installed/configured this piece of garbage? It's a miracle it ever worked at all!" Ditto every other kind of professional, from plumber to surgeon to architect to accountant.

      (As such a professional, I often discover

  • by spitek ( 942062 ) on Friday January 02, 2009 @12:53PM (#26301729) Homepage
    You pay your infrastructure people to maintain business, continuity I mean the tittle of this post made me go, "Really, no shit" That's like systems admin 101! If the admin was aware then the manager that didn't listen needs to be fired. If the manager listened and they are just run by retards then they got what they deserve. You'd think 17,000 visitors a month would be worth enough to do it right, in add revenue alone. The cost of a consumer machine running linux with a few TB's of SATA space - $1200 How much the company paid to have a system's admin play video games all day - $50,000 The cost of a 17,000 vistor a month site going down because they had no data base backups - Priceless.
  • Mirroring (Score:5, Insightful)

    by jav1231 ( 539129 ) on Friday January 02, 2009 @01:04PM (#26301917)
    See mirroring is like...well a mirror. If you stand before one and stick a fork in your eye your mirror-image does the same. In real time. Analogies are there for a reason.
    • by gEvil (beta) ( 945888 ) on Friday January 02, 2009 @03:30PM (#26303871)
      See mirroring is like...well a mirror. If you stand before one and stick a fork in your eye your mirror-image does the same. In real time. Analogies are there for a reason.

      There's a major flaw in your analogy. See, if I stick a fork in my right eye, the mirror image will stick a fork in his left eye. Between the two of us, however, we still have one good left AND right eye. So ipso fatso, I have a complete backup.
  • by RevWaldo ( 1186281 ) on Friday January 02, 2009 @01:29PM (#26302333)
    This is why users should be able to easily back up their own data for any online service. If a service entrusted with your data provides no straightforward way to drop a copy of it onto your own hard drive, don't trust it. I'd go as far to say that any service that doesn't strongly recommend you keep your own backups shouldn't be trusted.

    Do the big kahunas of the "Web 2.0" world give users that option? Gmail, Myspace, Facebook, Twitter etcetera ad nauseam?
  • OS X Server (Score:3, Interesting)

    by DTemp ( 1086779 ) on Friday January 02, 2009 @02:56PM (#26303469)

    The site was run on OS X Server... I think this may be indicative of the level of IT effort with the company. Look, *I* run an OS X Server... but *I* am a Biology major that knows approximately dick about the UNIX command line, and use it to run a server that I probably wouldn't be able to run any other way. I also have it backup nightly to a cheap NAS, archiving old backups, and I've tested a restore to make sure it works.

    This is probably just a couple guys who ran a website in their spare time... not a huge IT effort that failed.

  • by ZiggyM ( 238243 ) * on Friday January 02, 2009 @04:45PM (#26304713)
    Are there Darwin awards for websites?

"The vast majority of successful major crimes against property are perpetrated by individuals abusing positions of trust." -- Lawrence Dalzell

Working...