Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Data Storage

Why Power Failures Can Always Lead To Data Loss 456

bigsmoke writes "So, all your servers run on RAID. You back up religiously. You're even sure that your backups are recoverable. But do you also need a UPS? According to Halfgaar (on Slashdot before to promote better Linux backup practices), yes, usually you do. He argues that despite technological advancements such as file system journaling, power failures can still cause data loss in most setups."
This discussion has been archived. No new comments can be posted.

Why Power Failures Can Always Lead To Data Loss

Comments Filter:
  • by Zebadias ( 861722 ) on Wednesday July 23, 2008 @01:07PM (#24307011)
    UPS smooths out all those nasty spikes as well as stopping your servers from going down to a 1 second power cut.

    UPS is more than just saving your data.

  • by Anonymous Coward on Wednesday July 23, 2008 @01:11PM (#24307091)

    Ok, people who don't just read the executive summary knew this all along, but perhaps it's necessary that someone spells it out for the rest: Journaling and RAID do not prevent data loss in case of a power outage (and many more circumstances). If you know why, just skip the article. If you're wondering how you can lose data if you write everything to two disks and your filesystem guarantees its own consistency, then perhaps this is the wake up call that you need.

  • by pembo13 ( 770295 ) on Wednesday July 23, 2008 @01:12PM (#24307103) Homepage
    APC is the only UPS maker on the market that has at least spent some small effort so that their UPSs can be properly integrated with a Linux machine. I made the mistake of purchasing an Ultra UPS as it was cheaper than the APC.
  • by linuxpyro ( 680927 ) on Wednesday July 23, 2008 @01:17PM (#24307173)

    It's also important to get a decent UPS too, if you're using it for something like a server. I think the cheapy ones basically just use a transfer relay, where as the higher end ones actually run the hardware off of the battery via the inverter all the time. While I would think that with the former (called "standby" UPSs maybe?) the transfer time wouldn't be enough to cause too many problems, you still don't have the buffer that you'd get with a true uninterruptible power supply.

    I think a lot of the cheaper ones don't put out a true sine wave either, though for their intended purpose of letting you shutdown your desktop cleanly again they're probably fine.

  • by mlwmohawk ( 801821 ) on Wednesday July 23, 2008 @01:19PM (#24307219)

    Computer power supplies should be built with enough spare capacitance to run things long enough for the computer to save critical data

    Here's a question for you: Calculate the size of the capacitor needed that can hold enough power to run a 200W load for 5 minutes and maintain a voltage level within a specific usable range.

    Hint: its BIG. batteries are more space efficient, but the chemicals and outgassing make them inappropriate for location INSIDE the computer box.

  • by Anonymous Coward on Wednesday July 23, 2008 @01:27PM (#24307335)

    >UPS smooths out all those nasty spikes as well as stopping your servers from going down to a 1 second power cut.

    A true UPS smooths out the spikes. Most of today's UPSes (at least consumer models) are off-line [wikipedia.org] supplies. The batteries don't kick in unless the power is out. Worse than that, the cheap ones don't output sine waves, they output square waves. These UPSes also take some time to switch to batteries, leaving your computer without power for that time.

    Now, some of those UPSes have filtering technology like you find in expensive powerbars, sure. But it isn't the same as an always-on UPS at all.

  • by bgat ( 123664 ) on Wednesday July 23, 2008 @01:32PM (#24307453) Homepage

    Yes, quite. It can't handle the substantial inrush current needed by the laser printer.

    The "click" you hear in the UPS when the laser printer warms up is the UPS noting the drops on the power mains, which gives you some idea just how much current that printer needs.

    I have a Samsung ML2150, and have noticed the same thing. Lights flicker, etc. whenever I submit a print job and the printer transitions from standby to active. The various UPSes in my office sense that, and respond with clicks and beeps.

    Take the laser printer off the UPS. If you really need printer capability during a power failure, switch to an ink jet.

  • by Firehed ( 942385 ) on Wednesday July 23, 2008 @01:45PM (#24307705) Homepage

    Other than the lack of communication at present between the PSU and the rest of the system (on a hardware and software level), what you're describing really seems to be the computer equivalent of throwing your hands in front of your nuts as you spot the incoming baseball. It helps the immediate problem of data (or testicle) loss, but it's really just a small amount of damage control.

    This is why a proper UPS that can trigger a full system shutdown once you hit a certain power remaining threshold is far preferable. Granted I'd rather have a controlled crash than the risky nonsense that would come from the power cord being yanked, but (right now) computers can only go so far to help themselves in a couple-second window.

  • by SuperQ ( 431 ) * on Wednesday July 23, 2008 @01:45PM (#24307709) Homepage

    Yup the 3 major types of battery UPSs I know of:

    Offline - Relay or simple failover. (APC Backups)

    Line Interactive - Can correct line over/under voltage to a point (APC Smartups)

    Online - Full AC -> DC -> AC conversion. (APC Symetra, Liebert, anything that doesn't suck)

    Basically outside of home use you want an online type UPS.

    There are other systems like motor/generator flywheel types, but they need a very fast backup generator to sustain anything more than 30 seconds of outage. But they're great for smoothing out some types of line issues.

  • ZFS (Score:1, Informative)

    by Anonymous Coward on Wednesday July 23, 2008 @01:48PM (#24307783)

    Always? Maybe if you are using Linux. Not if you are using an OS that runs ZFS filesystems.

    --AC

  • by jalet ( 36114 ) <alet@librelogiciel.com> on Wednesday July 23, 2008 @01:58PM (#24308007) Homepage

    This morning we had a planned shutdown of 100 servers for eletricity works, all were on the same 40 kVA UPS. All went fine, we shutdown all servers to be safe, and kept some stuff online for montoring and the like, then main power was shut off. The UPS gladly took the load, with an estimated battery life of 75 minutes, more than what was needed for the electrical work. Once this was done, the electrician put the main power back on, and... the UPS shutdown !

    Since all servers were stopped already we didn't lose anything, but we had to put the UPS in bypass mode for a while, then back on, and now we hope for the best waiting for the UPS to be repaired, crossing most of our fingers because of the holidays...

    In summary : testing that the UPS can handle the power coming back is as important as testing for it to be able to handle the power shutting down.

  • by v1 ( 525388 ) on Wednesday July 23, 2008 @02:09PM (#24308215) Homepage Journal

    Last night we had a power outage. I shut down the desktop and was able to continue working for almost 2 hours on the laptop because with the Desktop down the UPS was only carrying the DSL router and the WiFi box.

    good uptime for a laptop. got a second battery? (I know I do)

    Inverters for the Servers. (DC PSUs are available for some of the servers we use but at so high a premium that the inverters are cheaper.)

    that's because it just has to invert it before it can step it up or down. If you supply DC you are actually introducing another necessary step. It gets hard to cram 2x the electronics into the PS. Inverters are definitely the way to go.

    We can handle a dozen Power cuts in a day with no service interruption or data loss ("Tested" 2 weeks ago) and we can stay up without external power for more than a week. After that we have to start trucking in additional diesel.

    Yep. That's right. With sufficient fuel we can be online indefinably. Which we will have to do if we get hit by a major hurricane.

    Might want to rethink how easy it is to get a truck in during a hurricane. ;) Unless it's more of a boat, think Katrina.

    Imagine a server, where UPS #2 is down for repairs, UPS #1 fails during a power cut, When everything comes back up we find 2 failed hard drives in the RAID 5 on the email server. despite previous testing and confirmation that the backups work the most recent tapes failed to read.

    um, ouch?

    Best advise? Memorize all your important data. That way if you loose your mind, you are not responsible for the lost Data (or anything else).

    Was going to say, all of the above is moot if an EF5 rolls through town. Better add "offsite backup" to your list if it's not already there. With the EF5 that ran through here last month, some people got their backups turned into "offsite" backups. (maintenance guy was here last week, said they are still looking for their dump truck )

  • by bravecanadian ( 638315 ) on Wednesday July 23, 2008 @02:12PM (#24308253)

    Any professional server or data center setup that does not include a UPS for a graceful shutdown... is almost by definition NOT professional.

  • Voltage Spikes (Score:5, Informative)

    by natoochtoniket ( 763630 ) on Wednesday July 23, 2008 @02:12PM (#24308259)

    The typical small UPS system has some amount of surge protection built-in. But it's typically only good for at most a couple thousand joules. But then, if you get a spike that is big enough to blow a varister, you also get to buy a new ups.

    A better solution is to put a "whole house" surge protector on the circuit-breaker panel. It protects everything, with a much higher number of joules. Five or six pounds of varisters can absorb a lot more shock than one ounce of varisters. They cost about $100, and can be found at most big hardware stores or electrical supply houses. That doesn't eliminate the need for a ups. It does protect the ups, along with the other equipment, from most voltage spikes.

    Last year, lightning hit the power pole 20 feet from my house. We know where it hit because the pole caught fire. My next-door neighbors on both sides lost every single piece of electrical equipment -- not just computers, TV's, and stereos, but also fridge, microwave, water heater, and range. All of it was damaged beyond repair. We barely noticed the hit, except for the bright flash of light, and had no damage at all.

  • by Richard Steiner ( 1585 ) <rsteiner@visi.com> on Wednesday July 23, 2008 @02:16PM (#24308299) Homepage Journal

    Real text editors will recover gracefully from such situations. :-)

    (I'm think along the lines of @UEDIT on OS2200 which saves its entire virtual memory state to disk periodically and can recover it with ease at the next startup, or the old EDT editor on VMS which saved the commands one entered and could replay them when a recovery was specified).

    I'm surprised more text editors don't have a similar feature. I think vim does, tho...?

  • by supersat ( 639745 ) on Wednesday July 23, 2008 @02:55PM (#24309025)
    Are you sure your disks are in write-through mode? Have you checked [livejournal.com]? Brad Fitzpatrick (of LiveJournal, memcache, OpenID, etc. fame) discovered that many disks lie about being in write-through mode, and wrote a utility to check it.
  • by jimicus ( 737525 ) on Wednesday July 23, 2008 @03:09PM (#24309241)

    Why 5 minutes? It usually takes less than a second to run a sync on the disks depending on how active they are. A couple seconds of runtime should be enough to do an "emergency shutdown" and avoid data corruption.

    ####@johncash:~$ time sync

    real 0m0.004s
    user 0m0.004s
    sys 0m0.000s

    That will sync the disks, but it won't stop the database from accepting incoming data. It won't stop cron jobs which might be just about to trigger. It won't deal with tasks that are in the middle of a big operation which involves a lot of writing to disk.

  • Wrong, I think. (Score:3, Informative)

    by Spazmania ( 174582 ) on Wednesday July 23, 2008 @04:13PM (#24310151) Homepage

    The hard drives and DMA controller however, will run a bit longer; so if data is being written to disk, the DMA controller will keep reading data from memory, but it has no idea that this data is corrupted.

    Pretty sure that's wrong. It used to be (20 years ago) that hard drives losing power in this way had a chance of the heads crashing against the platters (the fabled "hard drive crash"). To solve this, modern drives are very sensitive to the power input. As soon as power fails the drives extract power from the spinning platters to move the heads over to the parked position. Regardless of what the DMA controller thinks it should be doing, the hard drive is busy parking the heads.

  • Re:Voltage Spikes (Score:5, Informative)

    by natoochtoniket ( 763630 ) on Wednesday July 23, 2008 @04:26PM (#24310331)

    The path-to-ground is really important, as is the quality of the ground. The length of the path is the reason why whole-house devices are installed at the service entrance panel. But, that assumes that your service-entrance ground is a good ground.

    If your ground is not good, shorting to ground won't do much good. A lot of houses around here are grounded to plumbing pipe that is buried just 12" deep. During a dry spell a few years ago, I detected variable voltage where it shouldn't have been. The voltage problems cleared up after I added an 8-foot vertical ground rod to the system.

    The thing that kills a surge protector is too many amps for too long. If it shorts the power to ground (low-resistance), but the ground is not really well-grounded, then the whole thing can float close to line-voltage. In that case, that voltage can destroy your other devices, while the surge unit never gets enough current to burn the varisters.

  • by Frank T. Lofaro Jr. ( 142215 ) on Wednesday July 23, 2008 @05:39PM (#24311397) Homepage

    Actually when power drops the "power good" line from the power supply goes low, which causes a system reset and locks everything up.

    This is also how the computer knows how long to keep the reset line engaged on startup, it stays asserted until the power supply says the power is good, and everything has proper voltage.

  • Re:Voltage Spikes (Score:3, Informative)

    by russotto ( 537200 ) on Wednesday July 23, 2008 @05:39PM (#24311401) Journal

    I'm not convinced that whole-house protection helps much either. A few years ago, there was some event during a thunderstorm - we never quite figured out what - that fried two TiVo modems, a garage door opener (the circuit board was visibly burned and light bulb shattered), a few Wirsbo hot-water thermostats (not even connected to the mains power, just low-voltage from the boiler), a few Vantage whole-house dimmer modules, an intercom, and a printer.

    Common-mode spike. The power line was fine, but your ground got knocked up to a few kilovolts by a nearby strike.

  • by pslam ( 97660 ) on Wednesday July 23, 2008 @06:02PM (#24311699) Homepage Journal

    I don't think this has been true since... maybe 8-10 years now? Definitely since MR drives came on the market (ages ago).

    Modern drives have:

    • A capacitor that stores enough charge to "emergency park" the heads.
    • Low voltage detection that kicks in, disables the head, and dumps the capacitor into the seek coil.

    It does NOT go writing crap all over whatever's between your data and the parked position, unless the drive is a defective design. The emergency park is a fairly brutal affair, and you'll typically see the datasheet list a maximum number that's notably lower than the max power cycles.

    It's also essential these days because:

    • The head should (of course) never touch the platter.
    • The drive can't actually spin up if the head is resting on the platter.
    • So the drive is designed with the assumption the head NEVER touches the platter in its lifetime.

    Normally that holds true. I've seen some drives (1.0" and 1.8" miniature ones) which suffered from head-on-platter but that was due to misdesign in the power supply feeding it (e.g voltage rails going slightly negative, draining the cap early).

    But anyway, the worst you'll get with the power going out is a partially written sector, which will then be marked bad, probably permanently. Or maybe a bunch of sectors. Or maybe bad in a different order to what the OS sent due due to caching.

    If you had a drive and/or RAID fail due to power outage, you should get a refund. You might lose a tiny amount of data, not the whole lot.

  • by v1 ( 525388 ) on Wednesday July 23, 2008 @08:13PM (#24313003) Homepage Journal

    What I have is a Tripp-Lite SB-2000, which is an oldie but a goodie. Only link I can find now is here [vancebaldwin.com]. It runs on 24v external power, so I just set two car batteries on top of it. Picked it up years ago for a song on ebay.

    That unit though really is meant to have massive batteries on it. (looks like 24v golf cart batteries maybe, it has large binding posts on it for the external battery, there is no internal battery)

    You can't just hook a car battery up to some old APC you have sitting around. It may run on it, but there are two factors to keep in mind:

    1) UPS's are designed with cooling in mind. Sure you can put a monster battery on it so it has a runtime (at max output) of an hour instead of 10 minutes, but is it going to catch on fire or just plain overheat and shut down at 30 minutes in?

    2) if it runs off the batteries, it has to charge them back up. The charge circuit faces the same limitations as the inverter in terms of capacity and cooling. Your UPS may run fine for 45 minutes, but then when power comes back, the charge circuit may fry after an hour of continuous load trying to bring the battery back up to full.

    and of course 3) installing a larger battery doesn't affect your maximum output (watts), it only affects your maximum uptime (watt-hours)

    I suppose also 4) is worth considering... not all hardware LIKES to run off a UPS. The power tends to be kinda nasty. I don't even want to know what my old tripp-lite puts out for power but I'm pretty sure it's very dirty. Fortunately all the hardware that's on it doesn't seem to mind. (yet) The longer you run something on a UPS, the more likely you are to damage it if it's not tolerant. I once tried placing a harmonic filter on my tripp-lite. Worked like a charm, put out a nearly perfect and clean sine wave. For about 6 minutes. Then it smoked. The power was simply too nasty for it to filter. Newer UPSs of course do better here. They usually advertise a "modified sine wave", same as you see stamped on inverters.

    Final note: no, you cannot stack UPS's. The line filters on modern UPS's don't like the power coming from a UPS and will switch on when the upstream UPS turns on.

  • Re:Voltage Spikes (Score:3, Informative)

    by natoochtoniket ( 763630 ) on Thursday July 24, 2008 @10:42AM (#24318527)

    Neutral and ground are supposed to be bonded at the service entrance panel, and not anywhere else. If the ground is actually grounded, with a big copper wire to a big copper spike that goes deeper than the water table, that will normally provide the path of least resistance for the electricity to follow.

    A lot of houses don't have a good ground connection. Most building codes (and the NEC) allow 25 ohms resistance on the ground connection. But it's hard to measure, so the building inspectors don't measure it. In order to measure it, you have to install an additional 8-foot spike ten feet away from the ground connection you want to measure.

    Plumbing systems used to be metal pipe, so a connection to plumbing was an adequate ground. But, now, most new plumbing is plastic, an insulator. A few years ago they tore up the streets in my neighborhood to install new water pipes (plastic of course). After they did that, the only ground on my house was the short length of metal pipe that ran from the house to the meter. And that pipe was less than 12 inches deep, in dry sandy soil.

    The easy way to be sure that you have a good ground is to install two new 8-foot spikes, at least 10 feet apart (from each other, and from any existing ground spike). Measure the ohms between them to be sure you have less than 25 ohms. Then bond BOTH of them to the existing ground at your service-entrance panel using bronze clamps and 6-gauge or larger copper wire. Costs less than $100, and can be done in just an hour or two.

Always look over your shoulder because everyone is watching and plotting against you.

Working...