Slashdot Log In
Tech Magazine Loses June Issue, No Backup
Posted by
CmdrTaco
on Wed May 02, 2007 07:43 AM
from the happens-to-everyone dept.
from the happens-to-everyone dept.
Gareth writes "Business 2.0, a magazine published by Time, has been warning their readers against the hazards of not taking backups of computer files. So much so that in an article published by them in 2003, they 'likened backups to flossing — everyone knows it's important, but few devote enough thought or energy to it.' Last week, Business 2.0 got caught forgetting to floss as the magazine's editorial system crashed, wiping out all the work that had been done for its June issue. The backup server failed to back up."
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
After the swearing stopped. (Score:5, Funny)
Then the swearing started again.
Re:After the swearing stopped. (Score:5, Insightful)
honestly, they CANT have competent IT. The FIRST thing you do in the morning is check the backups.
I have a HP sdat jukebox here and I STILL check the backup logs to make sure the backup and verify succeeded last night. if they dont I mirror the important files right away and then run a manual backup to not lose the last 24 hours of backup.
I hope that Business 2.0 learned that paying top $$$ for competent IT is a good idea and they should run a article about it.
Parent
Re:After the swearing stopped. (Score:5, Insightful)
There is not a week going by without me getting an issue from one of out regular analysts with question about how the customer can salvage their data because they don't have a backup. My standard answer is that we may be able to save some data, but it's going to cost a lot of $$$. And I also say: "When you don't have a backup, you have either deemed that you can easily recreate the data or that they are not important for the company"
And these are not mom&pop companies but big multi million/billion dollar companies.
Parent
Re:After the swearing stopped. (Score:4, Insightful)
HP DAT? You'd better do more than check the logs. A test restore (if your users don't already test for you by deleting files) at least a few times a week might save your butt one day. Actually DAT or not, test restores are a must. Logs lie.
Parent
Re: (Score:3, Funny)
Re: (Score:3, Insightful)
First, no one really understands best practices for backup, and a lot of systems that are backed up "successfully" can't be restored anyway (in fact, most commonly this is Microsoft Exchange, the most important system in most companies!). Second, Tape sucks! You MUST have Disk-to-disk backups to have any true recoverability in today's world. Third, check you logs EVERY day, there's no excuse! Fixing a
Re:After the swearing stopped. (Score:5, Informative)
The problem is that tech magazines are in the advertising business, not the tech business. I write content for the Web site of a tech radio show, and it's just a bunch of us in cubicles looking stuff up on Google. No tech people involved.
Parent
Paging Jerry Seinfeld (Score:5, Funny)
IT: We have your backup, we just can't restore it.
Jerry: But the backup keeps the data here, that's why you have the backup!
IT: I think I know why we have backups.
Jerry: I don't think you do. You see, you know how to MAKE the backup, you just don't know how to RESTORE the backup. And that's really the most important part of the backup: the restoring. Anybody can just make them.
Parent
With this much free advertising (Score:5, Funny)
We've all been there. Don't be too pious, here. (Score:3, Insightful)
Re:We've all been there. Don't be too pious, here. (Score:4, Informative)
The topic here is backups, not RAID.
Say it again with me everyone "RAID IS NOT A BACKUP"
RAID increases-uptime by decreasing/eliminating the downtimes needed to do restores when an individual drive bites it. It is *NOT* a backup.
RAID does not save you if someone accidentally deletes a needed file.
RAID does not save you if your machine gets nailed by a virus/upatched-exploit.
RAID does not save you if the drive power supply fries taking out attached hardware.
RAID does not save you if a bugler steals your machine.
RAID IS NOT A BACKUP.
Parent
Re:We've all been there. Don't be too pious, here. (Score:5, Funny)
Parent
Nelson Muntz (Score:5, Funny)
Ha-ha!
err... (Score:3, Insightful)
*coughs
TFA:
sorry, their MAIN problem is not in any way a dysfunctional backup system. ever heard of verifying backuped data?
Re:err... (Score:5, Informative)
honestly though, talking management into backup solutions is like pulling teeth, then they blame you for not having it in place when the failure does happen.
Last place I worked at we were using 4 year old DLT tapes because management was too stupid and cheap to buy new ones.
"we will buy new when those fail" is what we were told.
Parent
Re:err... (Score:4, Informative)
/grabs hammer...
*bang* *bang* *bang*
Oops, it looks like a couple of those DLT drives are running into problems. We need replacements. Did you see what happened to Business 2.0?
Parent
Re:err... (Score:4, Insightful)
"we will buy new when those fail" is what we were told
"Your successor will buy new when these fail." is the correct response to this.
Parent
Re:err... (Score:5, Funny)
Errr...uhh....umm...'verifying'? Uh, I'll be right back!
Parent
Re:err... (Score:5, Funny)
I'm sure they've heard of it, in a conversation that went something like this:
IT Guy: We need a system for verifying our backups.
Suit: How come? Don't the backups work?
IT Guy: We need to be sure that if there is a failure, the backups will be ok.
Suit: But they're just copies, aren't they? I copy files all the time and it never goes wrong.
IT Guy: This is a little more complicated than that.
Suit: How hard can it be?
IT Guy: Well, I was thinking we might need to hire a part-timer just to take care of backups and verification.
Suit: But we've never had a failure! Sounds like empire building to me. I know that's what I'd be doing in your position. Nice try. We'll keep the backup system the way it is, thanks.
IT Guy: But..!
Suit: Moving on to the next item on the agenda... ok, Executive Bonuses!
Parent
They probably still have most of it (Score:5, Insightful)
Re:They probably still have most of it (Score:4, Funny)
I think we can all relax and rest assured that the June issue of Business 2.0 will have all its intended advertising.
Parent
HAHAHA (Score:2, Funny)
Rag (Score:2, Funny)
That's OK, nobody reads Business 2.0 anyway.
Re:Rag (Score:5, Insightful)
I wish. I wish people didn't read Time, either (the publisher), but they do. Time's writing style is the dumbed down, try-to-be-hip crap I wouldn't have gotten away with in sixth grade. Seriously. Like I said before [slashdot.org], to understand why its writing is like fingernails on a blackboard for me, consider how the same information would be conveyed by two sources:
8-year-old: "6 divided by 3 is 2."
Time magazine: "Okay, imagine you've got a half-dozen widgets, churned out of the ol' Widget Factory on Fifth and Main. Now, say you've gotta divvy 'em up into little chunklets -- a doable three, let's say -- and each chunklet has the same number that math professor Gregory Beckens at Overinflated Ego University calls a 'quotient'. The so-called 'quotient' in this case? Dos."
Based on how that post got modded, I'm not alone in this.
Parent
What was the nature of the crash? (Score:2, Insightful)
But whatever the case - there is a useful lesson here. Make sure your backups are backing something up.
Re: (Score:3, Insightful)
usually that is the case but it has happened when one of my backup failed one night and someone needed a file restores from the previous day, if that company never checked it's backup or never configure some kind of noticaition upon failiure or success then they are very lame
High profile SNAFUs (Score:5, Insightful)
I don't know about you people, but after reading this (and giving it the "haha" tag) I'm going home and catching up on a couple of backups I've been slacking off on for a while.
Re:High profile SNAFUs (Score:5, Insightful)
Parent
Re:Didn't you read the article? (Score:4, Funny)
Parent
How does this actually happen? (Score:5, Interesting)
There aren't a lot of ways for a machine to "crash" that loses all its data. Even a lightning-fried hard drive can have its platters removed by a data recovery lab and many files can be pulled off. A mechanical failure doesn't grind the platters into sand. As a network server it really should have a RAID too. So how exactly can "the server crash" so spectacularly that the RAID, backups, and widely available data recovery services all fail? Did the building blow up?
Re: (Score:3, Informative)
Backups and fault-tolerant hardware cost money. You can talk about potential losses and risks until you're blue in the face, until it *actually* costs the company money, nobody will listen. What's going to happen here more than likely is the person who asked for the RAID will ge
Re:How does this actually happen? (Score:4, Insightful)
Parent
Re: (Score:3, Insightful)
Perhaps, although my experience is that IT people are incredibly bad at framing business cases in terms more compelling than my daughter's request for a mobile phone for her birthday: a few vague reasons, followed by a sulk when asked for specifics.
I keep 20TB on RAID5, and replicate it daily to a RAID5 array that has no components or software above the spindle level in common (Solaris/EMC and Pillar Data). The data we really care
RAID =! BACKUP (Score:3, Informative)
Re:How does this actually happen? (Score:5, Informative)
Doesn't it? [ufl.edu]
Parent
Re: (Score:3, Funny)
Me: You're home early; not enough work to do?
Roommate: No, the server burned out
Me: Oh, that's no big deal; you just wait for them to get replacement parts and then you get back to it
Roommate: No, seriously, it's burned out. The air conditioning unit failed, the entire server room heated up to the point of spontaneous combustion and the entire server room caught fire
Lesson learned, keep your backups somewhere far, far away from the servers.
At this exact moment across the world (Score:3, Funny)
Wrong problem (Score:5, Insightful)
the problem was, as always, not the backup. I've rarely seen problems resulting from the backup process. The troublesome process is the restore. Or as a friend put it once:
Nobody wants backups, what everybody wants is a restore.
In my twenty years of IT i've seen several companies making backups like a well oiled machine. The backup process was well documented and everyone was trained to a degree, they could do it with their eyes closed. But everything fell apart in the critical moment, because all they had planned was making the backup. Nobody ever imagined or tried a restore on the grand scale. So they ended up with a big stack of tapes with unuseable data.
Backup is the mean, not the goal.
Regards, Martin
Re:Wrong problem (Score:5, Interesting)
I heard a story about a LAN admin who was doing backups every night. The tapes would go into a safe, then would go offsite, then be used again.
Everything worked well(?) until they needed to do a restore. The tape in the safe was corrupt. The tape at the offsite storage was corrupt. No tape was good.
It seems that the LAN admin made tea every morning. The electric kettle sat on top of the steel safe.
So the backup tape was placed into the safe, then the kettle was started, magnetizing the safe, and erasing the tape.
Not ONCE did anyone try to do a test restore to prove the system.
Parent
Re:Wrong problem (Score:4, Informative)
Nice story, though. Reminds me of the sysadmin in my first company who automatically back-upped our server every day. Only problem was: the proces put a copy of the backup on a drive that was being back-upped. You can imagine what happened after a few weeks (it failed, disk full). He only noticed a few months later when we asked him to restore some files.
Parent
Re:Wrong problem (Score:4, Interesting)
Yes and No:
- Mirrored drives are a good protection against drive failures and (usually) offer an easy restore process. If you mirror a drive and put the copy away (e.g. into a safe) this is a real and widely used backup method. As always you should at least try once to boot the system while removing the primary disk. Somtimes RAID controllers have some irks too.
- This method usually depends on the availability of a certain hardware, if you cannot get a new mainboard or raid controller of the same type, the mirrored disk contains data you may have trouble getting at. You may ignore this issue, if you have the same hardware at a safe location again.
Regards, MartinParent
Better article (Score:3, Informative)
Link to original article (Score:3, Informative)
I wonder if they run DR on a regular basis. (Score:3, Informative)
For one of our server apps we actually have two laptops configured with all of the required software and we do restore production data from backups on a regular basis as we use that for our system testing on projects. This happens several times a year so we know that the backup and restore procedures truly work. It is also very cool walking in to the client site, plug in the laptop and show them that in an emergency they have a working machine very quickly. Not as fast as a server, but, it gets them a working machine until the replacement server arrives.
We can't backup, its too expensive. (Score:5, Insightful)
Backup stories (Score:3, Informative)
Story #1. Fortune 500 company. Lost some source. Big brouhaha. Edict went out: all files are to be backed up to diskettes and the diskettes sent to offsite storage which the management had contracted for with an outside firm. It took a lot of extra time, but people did it. After about two years, an important server with source code for a major product crashed. Developers tried to get the source back from offsite storage. It turns out that nobody at any point had taken any responsibility for cataloging, identifying, or indexing the diskettes. The diskettes might as well have not been labelled: the developers couldn't identify what diskettes were needed, and the offsite storage firm couldn't have retrieved them if they had.
Story #2. Medium-size scientific research organization with a Digital 11/70 running RSTS. Enlightened manager pays operator overtime pay to stay late three nights a week and do backups. Backups are performed with the "verify" option enabled. Tapes are placed in a fire-resistant tape vault every night. But no actual restores are performed. Database (Oracle, in the days when Oracle Corporation's name was still Relational Systems, Inc). is corrupted. A restore is attempted. It transpires that this version of Oracle uses the maximum record length for its files, which happens to be 65,536 bytes, and the Digital-supplied backup-restore utility... you guessed it... has a bug with records of that length. Yep. Writes 0 bytes, verifies 0 bytes.
Story #3. I worked at a place that recommended that individual developers perform individual backups using a cartridge tape system and some standard PC software. I set it up. There were two "verify" options. One used the cartridge system's read-after-write feature to read every block as it was written. The second performed the entire backup, then verified the entire backup in a second pass. Took twice as long, of course. I opted for the second method. The problem was: more than half the time, the verify would report one or two errors. And for some reason, probably efficiency of use of the tape, it didn't write file by file, it munged them into blocks. And it didn't even report the names of the files affected. Just "2 errors were encountered" or something like that. So, when that happened, I didn't see that a rational person had any alterative except to perform the whole backup again. And more than half the time, it would report a couple of errors the second time, and...
When I asked colleagues about this, it turned out that I was the only one ever to have picked the second verify option. Everyone else had picked the read-after-write-verify option, "because it was faster."
And told me not to fuss because "if it was only a couple of errors, the chances they were on files you needed to recover was too small to worry about."
Re:Why isn't this a default (Score:4, Informative)
Wait for OS X 10.5 and "Time Machine".
Parent
Re: (Score:3, Funny)
Re: (Score:3, Funny)
Ha ha!
Word Police (Score:3, Informative)