Tech Magazine Loses June Issue, No Backup 245
Gareth writes "Business 2.0, a magazine published by Time, has been warning their readers against the hazards of not taking backups of computer files. So much so that in an article published by them in 2003, they 'likened backups to flossing — everyone knows it's important, but few devote enough thought or energy to it.' Last week, Business 2.0 got caught forgetting to floss as the magazine's editorial system crashed, wiping out all the work that had been done for its June issue. The backup server failed to back up."
Re:How does this actually happen? (Score:3, Informative)
Backups and fault-tolerant hardware cost money. You can talk about potential losses and risks until you're blue in the face, until it *actually* costs the company money, nobody will listen. What's going to happen here more than likely is the person who asked for the RAID will get fired, as they're probably the same person in charge of the backups. This will also provide a scapegoat for that person's manager, since obviously if they got fired for it there need be no further repercussions or changes in behavior.
The only way they deserve to get fired is if they didn't advocate as hard as possible for enough backup hardware/software to allow for verification of backed up data and recovery in case of a mechanical hard drive failure. If they did, and were denied, then they did everything they could. (Which doesn't mean they won't get fired, it's just less deserved at that point. However, the thought there is that if they didn't want to get fired for incompetence, they should have tried to become a manager...)
Re:err... (Score:5, Informative)
honestly though, talking management into backup solutions is like pulling teeth, then they blame you for not having it in place when the failure does happen.
Last place I worked at we were using 4 year old DLT tapes because management was too stupid and cheap to buy new ones.
"we will buy new when those fail" is what we were told.
Re:err... (Score:4, Informative)
/grabs hammer...
*bang* *bang* *bang*
Oops, it looks like a couple of those DLT drives are running into problems. We need replacements. Did you see what happened to Business 2.0?
Better article (Score:3, Informative)
Re:How does this actually happen? (Score:5, Informative)
Doesn't it? [ufl.edu]
Link to original article (Score:3, Informative)
RAID =! BACKUP (Score:3, Informative)
Re:Wrong problem (Score:4, Informative)
Nice story, though. Reminds me of the sysadmin in my first company who automatically back-upped our server every day. Only problem was: the proces put a copy of the backup on a drive that was being back-upped. You can imagine what happened after a few weeks (it failed, disk full). He only noticed a few months later when we asked him to restore some files.
I wonder if they run DR on a regular basis. (Score:3, Informative)
For one of our server apps we actually have two laptops configured with all of the required software and we do restore production data from backups on a regular basis as we use that for our system testing on projects. This happens several times a year so we know that the backup and restore procedures truly work. It is also very cool walking in to the client site, plug in the laptop and show them that in an emergency they have a working machine very quickly. Not as fast as a server, but, it gets them a working machine until the replacement server arrives.
Re:Why isn't this a default (Score:4, Informative)
Wait for OS X 10.5 and "Time Machine".
Re:We've all been there. Don't be too pious, here. (Score:4, Informative)
The topic here is backups, not RAID.
Say it again with me everyone "RAID IS NOT A BACKUP"
RAID increases-uptime by decreasing/eliminating the downtimes needed to do restores when an individual drive bites it. It is *NOT* a backup.
RAID does not save you if someone accidentally deletes a needed file.
RAID does not save you if your machine gets nailed by a virus/upatched-exploit.
RAID does not save you if the drive power supply fries taking out attached hardware.
RAID does not save you if a bugler steals your machine.
RAID IS NOT A BACKUP.
Re:After the swearing stopped. (Score:5, Informative)
The problem is that tech magazines are in the advertising business, not the tech business. I write content for the Web site of a tech radio show, and it's just a bunch of us in cubicles looking stuff up on Google. No tech people involved.
Word Police (Score:3, Informative)
Re:Wrong problem (Score:1, Informative)
Same reason that some of us prefer Software RAID - giving us even more flexibility in what we use to rebuild after a disaster. I could us an HP controller, a 3Ware controller, or some other controller.
Backup stories (Score:3, Informative)
Story #1. Fortune 500 company. Lost some source. Big brouhaha. Edict went out: all files are to be backed up to diskettes and the diskettes sent to offsite storage which the management had contracted for with an outside firm. It took a lot of extra time, but people did it. After about two years, an important server with source code for a major product crashed. Developers tried to get the source back from offsite storage. It turns out that nobody at any point had taken any responsibility for cataloging, identifying, or indexing the diskettes. The diskettes might as well have not been labelled: the developers couldn't identify what diskettes were needed, and the offsite storage firm couldn't have retrieved them if they had.
Story #2. Medium-size scientific research organization with a Digital 11/70 running RSTS. Enlightened manager pays operator overtime pay to stay late three nights a week and do backups. Backups are performed with the "verify" option enabled. Tapes are placed in a fire-resistant tape vault every night. But no actual restores are performed. Database (Oracle, in the days when Oracle Corporation's name was still Relational Systems, Inc). is corrupted. A restore is attempted. It transpires that this version of Oracle uses the maximum record length for its files, which happens to be 65,536 bytes, and the Digital-supplied backup-restore utility... you guessed it... has a bug with records of that length. Yep. Writes 0 bytes, verifies 0 bytes.
Story #3. I worked at a place that recommended that individual developers perform individual backups using a cartridge tape system and some standard PC software. I set it up. There were two "verify" options. One used the cartridge system's read-after-write feature to read every block as it was written. The second performed the entire backup, then verified the entire backup in a second pass. Took twice as long, of course. I opted for the second method. The problem was: more than half the time, the verify would report one or two errors. And for some reason, probably efficiency of use of the tape, it didn't write file by file, it munged them into blocks. And it didn't even report the names of the files affected. Just "2 errors were encountered" or something like that. So, when that happened, I didn't see that a rational person had any alterative except to perform the whole backup again. And more than half the time, it would report a couple of errors the second time, and...
When I asked colleagues about this, it turned out that I was the only one ever to have picked the second verify option. Everyone else had picked the read-after-write-verify option, "because it was faster."
And told me not to fuss because "if it was only a couple of errors, the chances they were on files you needed to recover was too small to worry about."
Re:Why isn't this a default (Score:2, Informative)