Slashdot Log In
Why Power Failures Can Always Lead To Data Loss
Posted by
timothy
on Wednesday July 23, @01:03PM
from the when-velcro-snags-shoelaces dept.
from the when-velcro-snags-shoelaces dept.
bigsmoke writes "So, all your servers run on RAID. You back up religiously. You're even sure that your backups are recoverable. But do you also need a UPS? According to Halfgaar (on Slashdot before to promote better Linux backup practices), yes, usually you do. He argues that despite technological advancements such as file system journaling, power failures can still cause data loss in most setups."
Related Stories
[+]
Backing up a Linux (or Other *nix) System 134 comments
bigsmoke writes "My buddy Halfgaar finally got sick of all the helpful users on forums and mailing lists who keep suggesting backup methods and strategies to others which simply don't, won't and can't work. According to him, this indicates that most of the backups made by *nix users simply won't help you recover, while you'd think that disaster recovery is the whole point of doing backups. So, now he explains to the world once and for all what's involved in backing up *nix systems."
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.

Well no shit, Sherlock (Score:5, Insightful)
Power losses can cause data loss? Gee, you mean that my system that relies on electricity for everything it does can be adversely effected by power outages even if I take precautions? That's some good admin work there, Lou -- if only there was some sort of law that covered the tendency of things that can go wrong to go wrong...
Next week: Fires can make things warm, floods can make things wet.
Reply to This
Re:Well no shit, Sherlock (Score:5, Funny)
I don't know about you, but my servers run on the power of cotton candy and happy thoughts.
Reply to This
Parent
Re:Well no shit, Sherlock (Score:5, Funny)
I don't know about you, but my servers run on the power of cotton candy and happy thoughts.
As a former sysadmin, I would think that any machine reliant on 'happy thoughts' would be the most crash-prone system in the history of computing.
Reply to This
Parent
Re:Well no shit, Sherlock (Score:5, Funny)
I can offer you a Happy Thought UPS. It's a box of puppies. Be careful though, it only has 500 puppy Amps of capacity.
Reply to This
Parent
Re:Well no shit, Sherlock (Score:5, Funny)
Except the server that runs http://youporn.com/ [youporn.com]
Reply to This
Parent
Re:Well no shit, Sherlock (Score:5, Funny)
My servers run on Electricity but the RAID controller has battery backed up RAM so any cached data will persist a power failure and the disks are in writethrough mode.
I like this setup, but please. Tell me more about this cotton candy technology? Is it superior.
Reply to This
Parent
Re:Well no shit, Sherlock (Score:5, Funny)
Your mom loves you and pays for the electricity. That doesn't mean that your servers run on love.
Reply to This
Parent
Re:Well no shit, Sherlock (Score:5, Informative)
Ok, people who don't just read the executive summary knew this all along, but perhaps it's necessary that someone spells it out for the rest: Journaling and RAID do not prevent data loss in case of a power outage (and many more circumstances). If you know why, just skip the article. If you're wondering how you can lose data if you write everything to two disks and your filesystem guarantees its own consistency, then perhaps this is the wake up call that you need.
Reply to This
Parent
Re:Well no shit, Sherlock (Score:5, Funny)
One of the first things that will happen, is that the memory DIMMs will no longer be refreshed properly (DRAM needs to be refreshed constantly otherwise it will loose it's data) and very rapidly, the memory will contain only garbage. The hard drives and DMA controller however, will run a bit longer; so if data is being written to disk, the DMA controller will keep reading data from memory, but it has no idea that this data is corrupted.
However, we've recently seen that RAM holds state well enough to preserve crypto keys thru a power cycle [hackaday.com]. This has very scary implications: the RAM knows what's happening, and behaves differently (loses data immediately on power-off or remembers it for several seconds) in order to cause the most difficulty for the owner of the machine.
Not only are computer components intelligent and self-aware, they're also out to get us!
Reply to This
Parent
Re:no, that's not the scary thing (Score:5, Funny)
its not worth loosing you're cool about grammer misteaks and etc.
Reply to This
Parent
Illiteracy (Score:5, Funny)
From TFA:
(DRAM needs to be refreshed constantly otherwise it will loose it's data)
Fly, little data! Be free!
Reply to This
can always lead to data loss? (Score:5, Funny)
Reply to This
Duh! (Score:5, Insightful)
I remember a discussion on the PostgreSQL hacker's list about recoverability and transaction logs.
You can't make a system that will not lose data, you can only make a system that knows the last save point of 100% integrity.
There are too many variables and too much randomness on a cold hard power failure. You absolutely need a UPS that gives you time to shut down cleanly.
Reply to This
Well of course you need UPSs, but (Score:5, Informative)
Reply to This
It happened to someone (Score:5, Insightful)
Reply to This
Re:It happened to someone (Score:5, Funny)
Yes. My first reaction upon reading the summary was.. "Duh?" What, did they have it plugged into the wall before that? A UPS becomes MORE critical, not less, as the cost of hardware (RAID arrays are expensive) goes up.
Reply to This
Parent
Don't for get to test people, TEST! (Score:5, Insightful)
We recently had a test night where all we did was test the UPS system and shutdown procedures, and there was a couple gotchas. Interestingly, by default the APC powerchute app we were using defaulted to shutting down the UPS completely after the [first] server went down - not good. This was buried fairly deeply in the configuration.
Equally important to any protection measure, be it RAID, Power Protection, whatever - is testing!
Reply to This
Is this bring your kid to work day? (Score:5, Funny)
Ok, now everyone has something to give to your kid for the sysadmin-in-traning class.
For the rest of us... back to work, nothing here you didn't learn your first year.
For the poster... Shame shame... Turn in your card.
Reply to This
Re:What this really points out... (Score:5, Informative)
Computer power supplies should be built with enough spare capacitance to run things long enough for the computer to save critical data
Here's a question for you: Calculate the size of the capacitor needed that can hold enough power to run a 200W load for 5 minutes and maintain a voltage level within a specific usable range.
Hint: its BIG. batteries are more space efficient, but the chemicals and outgassing make them inappropriate for location INSIDE the computer box.
Reply to This
Parent
Re:What this really points out... (Score:5, Insightful)
Who the hell is talking about 5 minutes!? I'm saying you should be able to get a clean shutdown in 5 seconds if you prioritize it correctly.
Reply to This
Parent
Re:What this really points out... (Score:5, Interesting)
This is old hat in embedded systems.
Yes, but embedded systems usually have lower power requirements, or at the very least, a smaller range of power requirements. You can't add 3 PCIe cards, a few extra drives, and a few more GB of RAM to most embedded systems.
I worked on the design of an embedded system a few years ago that had a holdup spec - I think it was supposed to survive for 50 ms with no power. So a 50 ms power interruption would result in continued operation, while an outage longer than that was allowed to reset the board. However, the power draw on the board was around 200 Watts; being able to supply that much power for that long in a fairly compact form factor was a huge hurdle. It also caused airflow problems, because the giant capacitors would prevent air from getting to other components on the board, like the CPU. In the next version of the spec, I believe the holdup requirement was eliminated - apparently we weren't the only ones having trouble meeting that requirement.
Reply to This
Parent
Our Tandem (Score:5, Interesting)
We spent most of the day getting our servers back up from various states of disrepair (confirming the article, power loss is superbad). It turns out that our main medical software ran on a Tandem. Though the drives and such lost power, the CPU had a backup of D-batteries and survived the power loss just fine. Needless to say, we stopped making fun of their seemingly primitive emergency backup power.
Reply to This
Parent
Re:Not me! (Score:5, Funny)
If there's clouds in your server room, your server's probably been slashdotted and is on fire!
Reply to This
Parent
Re:That's what I always say sometimes (Score:5, Interesting)
Rule #1.
NEVER plug a laser printer into a UPS. The power that the fuser draws is WAY too much.
Look at some of the cheap office units, they show little pictures on them, notice the printer icon is on the surge side, NOT battery/surge side.
If the power goes out, you should NOT be trying to print.
http://articles.techrepublic.com.com/5100-10878_11-6085460.html [com.com] See #6
http://arstechnica.com/guides/other/ups.ars/3 [arstechnica.com]
http://www.jetcafe.org/npc/doc/ups-faq.html#0405 [jetcafe.org] see 04.05
Would you put a space heater on a UPS? Shredder? Vacuum? Table Saw? If you put a laser printer on it, you may as well.
Reply to This
Parent
Re:UPS - more than just a backup. (Score:5, Informative)
Yup the 3 major types of battery UPSs I know of:
Offline - Relay or simple failover. (APC Backups)
Line Interactive - Can correct line over/under voltage to a point (APC Smartups)
Online - Full AC -> DC -> AC conversion. (APC Symetra, Liebert, anything that doesn't suck)
Basically outside of home use you want an online type UPS.
There are other systems like motor/generator flywheel types, but they need a very fast backup generator to sustain anything more than 30 seconds of outage. But they're great for smoothing out some types of line issues.
Reply to This
Parent