Why Power Failures Can Always Lead To Data Loss 456
bigsmoke writes "So, all your servers run on RAID. You back up religiously. You're even sure that your backups are recoverable. But do you also need a UPS? According to Halfgaar (on Slashdot before to promote better Linux backup practices), yes, usually you do. He argues that despite technological advancements such as file system journaling, power failures can still cause data loss in most setups."
Well no shit, Sherlock (Score:5, Insightful)
Power losses can cause data loss? Gee, you mean that my system that relies on electricity for everything it does can be adversely effected by power outages even if I take precautions? That's some good admin work there, Lou -- if only there was some sort of law that covered the tendency of things that can go wrong to go wrong...
Next week: Fires can make things warm, floods can make things wet.
Duh! (Score:5, Insightful)
I remember a discussion on the PostgreSQL hacker's list about recoverability and transaction logs.
You can't make a system that will not lose data, you can only make a system that knows the last save point of 100% integrity.
There are too many variables and too much randomness on a cold hard power failure. You absolutely need a UPS that gives you time to shut down cleanly.
What this really points out... (Score:3, Insightful)
is a weak spot in the design of most computers.
Computer power supplies should be built with enough spare capacitance to run things long enough for the computer to save critical data, and operating systems and critical apps should be able to handle an emergency shutdown and save critical data in very short order.
This is old hat in embedded systems.
It happened to someone (Score:5, Insightful)
Don't for get to test people, TEST! (Score:5, Insightful)
We recently had a test night where all we did was test the UPS system and shutdown procedures, and there was a couple gotchas. Interestingly, by default the APC powerchute app we were using defaulted to shutting down the UPS completely after the [first] server went down - not good. This was buried fairly deeply in the configuration.
Equally important to any protection measure, be it RAID, Power Protection, whatever - is testing!
Get a UPS (Score:4, Insightful)
I really can't understand people who don't have a UPS. Don't you care about your data? At all? The UPS is not very expensive (My BackUPS 900 is very nice and only $100), and will last a long time (you just replace the batteries now and then). Once you are on UPS, you can stop worrying about any power issues, journalling file systems, crash recovery, and all that. The computer will never fail due to power. If you run Linux, it will also never fail due to the OS. If you are a normal user, that means your computer will never fail, period. Seriously, there is no excuse for not having a UPS. Go and get one right now!
Re:What this really points out... (Score:5, Insightful)
Who the hell is talking about 5 minutes!? I'm saying you should be able to get a clean shutdown in 5 seconds if you prioritize it correctly.
Re:What this really points out... (Score:4, Insightful)
Why 5 minutes? It usually takes less than a second to run a sync on the disks depending on how active they are. A couple seconds of runtime should be enough to do an "emergency shutdown" and avoid data corruption.
####@johncash:~$ time sync
real 0m0.004s
user 0m0.004s
sys 0m0.000s
Re:Duh! (Score:4, Insightful)
You're still hosed if your server's power supply goes titsup. Or if your hard drive crashes. Or if the building burns down.
Gotta love these slashvertisements, I wonder whose UPSes they're pimping? Its not like we don't all know you need a UPS. What's next, a FA about how you need fire insurance?
Re:Well no shit, Sherlock (Score:4, Insightful)
That is until the 10,000 volt spike when the power company improperly brings the grid back up bakes the RAM, the battery, RAID controller and the hard drives.
Re:Get a UPS (Score:2, Insightful)
Re:What this really points out... (Score:3, Insightful)
You mean, the battery location on my laptop is not appropriate ?
I know laptop and servers are very different but still, if my laptop can run 2 or 3 hours on a battery (including the LCD), it should not be that difficult to use the same technology to power a server for a 5 minutes (with no screen needed).
Re:That's what I always say sometimes (Score:1, Insightful)
WTF is wrong with your power installations, guys? Flickering lights, brownouts whenever the printer warms up, voltage spikes every day? Is your electricity produced by hamsters in wheels and delivered through bell wire? Perhaps you should stop sprinkling the place with UPSs and pay someone to redo your electrical installation instead.
He forgot UPS-triggered shutdown (Score:5, Insightful)
If you're not at the machine, or don't know how to shutdown without a CRT, the disk can get messed up when the UPS runs out of power. Unless you only have a desktop machine with no network applications writing to disk (no BitTorrent); then you might be OK if you just walk away from your keyboard and let the system become quiescent before it loses power.
Re:Well no shit, Sherlock (Score:3, Insightful)
Ok, people who don't just read the executive summary knew this all along, but perhaps it's necessary that someone spells it out for the rest: Journaling and RAID do not prevent data loss in case of a power outage (and many more circumstances). If you know why, just skip the article. If you're wondering how you can lose data if you write everything to two disks and your filesystem guarantees its own consistency, then perhaps this is the wake up call that you need.
Any Server Admin who didnt realize that isnt really a server admin. And the rest of the world probably doesnt care or need to know.
Just a thought... ;-)
Re:Well no shit, Sherlock (Score:1, Insightful)
Seriously, get over yourself. Commodity servers are powering the internet as we know it, and your IBM dinosaurs are filling landfills.
I dont get it (Score:3, Insightful)
1) You build a RAID5 array
2) You backup
3) You test your backups
4) You plug your server DIRECTLY INTO THE WALL?!?!
Ummm DUH! Of course you need a UPS - what kind of yutz does 1-3 and then powers the server off of unconditioned wall power?
Re:Well no shit, Sherlock (Score:2, Insightful)
I lost an entire RAID 5 disk array due to bad ram. It was running Windows 2003 64bit server and one day I turned the screen on and noticed some artifacts and a completely locked up machine. To be sure it wasn't some freeze up in the GUI I tried accessing the shares which didn't respond.
So I was like ok great..time to do a hard shut down and reboot. Well, when it came back up I noticed my RAID array is no longer showing up in the shares or in disk manager. I was like..aww crap. I tried to rebuild the array via the built-in tools of the raid controller and it didn't work. Somehow it totally fuber the disk array tables to the point everything on my 5 320gig disks are trashed. Good thing the OS runs on a separate non-raid hard disk right off the motherboard's disk controller.
Nothing wrong with the raid controller and the drives. Just at the point of writing stuff to the drives RAM had to take a dump and totally froze the server.
Least to say I swapped out the ram modules with known good ones and never had a problem since. Lucky I regularly make backups of my critical stuff to another set of hard drives elsewhere.
I follow this moral code as my second religion, "Don't put all your eggs in one basket!"
Re:What this really points out... (Score:2, Insightful)
Apples and oranges. I'll use a car analogy since they are always appropriate.
If my car can run 6 or 7 hours at 70 MPH, it should not be that difficult to use the same car to run at 420 MPH for 1 hour.
Re:Well no shit, Sherlock (Score:3, Insightful)
Re:Voltage Spikes (Score:3, Insightful)
I'm not convinced that whole-house protection helps much either. A few years ago, there was some event during a thunderstorm - we never quite figured out what - that fried two TiVo modems, a garage door opener (the circuit board was visibly burned and light bulb shattered), a few Wirsbo hot-water thermostats (not even connected to the mains power, just low-voltage from the boiler), a few Vantage whole-house dimmer modules, an intercom, and a printer.
The house was, at the time, "protected" with two Cutler-Hammer CHSP suppressors (MOV). After the incident, their "protection working fine" LED was still lit! The only room with no damage was my recording studio, which had Equitech balanced-power panels; the ginormous-hunk-of-iron transformer probably saved me there. The power company had no reports of direct lightning strikes, other than one hit that took out a transformer (and since my power didn't go out, I apparently wasn't on that circuit).
I recall doing some reading about lightning arrestors, ground grids, and such, and eventually came to the conclusion that it (a) surge suppressors are fairly useless, because they don't always present the quickest path to ground, and (b) it would be 10x cheaper to let stuff die and replace it than to set up a proper lightning protection system.
Re:Well no shit, Sherlock (Score:3, Insightful)
Any Server Admin who didnt realize that isnt really a server admin. And the rest of the world probably doesnt care or need to know.
Just a thought... ;-)
The fact that they're not really server admins doesn't stop them from running servers, though!