Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Data Storage

Why Power Failures Can Always Lead To Data Loss 456

bigsmoke writes "So, all your servers run on RAID. You back up religiously. You're even sure that your backups are recoverable. But do you also need a UPS? According to Halfgaar (on Slashdot before to promote better Linux backup practices), yes, usually you do. He argues that despite technological advancements such as file system journaling, power failures can still cause data loss in most setups."
This discussion has been archived. No new comments can be posted.

Why Power Failures Can Always Lead To Data Loss

Comments Filter:
  • On the other hand.. (Score:3, Interesting)

    by m0i ( 192134 ) on Wednesday July 23, 2008 @01:16PM (#24307161) Homepage

    you can recover your RAM minutes after loosing power.. no kidding! http://citp.princeton.edu/memory/ [princeton.edu]

  • by Macman408 ( 1308925 ) on Wednesday July 23, 2008 @01:27PM (#24307321)

    This is old hat in embedded systems.

    Yes, but embedded systems usually have lower power requirements, or at the very least, a smaller range of power requirements. You can't add 3 PCIe cards, a few extra drives, and a few more GB of RAM to most embedded systems.

    I worked on the design of an embedded system a few years ago that had a holdup spec - I think it was supposed to survive for 50 ms with no power. So a 50 ms power interruption would result in continued operation, while an outage longer than that was allowed to reset the board. However, the power draw on the board was around 200 Watts; being able to supply that much power for that long in a fairly compact form factor was a huge hurdle. It also caused airflow problems, because the giant capacitors would prevent air from getting to other components on the board, like the CPU. In the next version of the spec, I believe the holdup requirement was eliminated - apparently we weren't the only ones having trouble meeting that requirement.

  • by alta ( 1263 ) on Wednesday July 23, 2008 @01:30PM (#24307397) Homepage Journal

    Rule #1.

    NEVER plug a laser printer into a UPS. The power that the fuser draws is WAY too much.

    Look at some of the cheap office units, they show little pictures on them, notice the printer icon is on the surge side, NOT battery/surge side.

    If the power goes out, you should NOT be trying to print.

    http://articles.techrepublic.com.com/5100-10878_11-6085460.html [com.com] See #6

    http://arstechnica.com/guides/other/ups.ars/3 [arstechnica.com]

    http://www.jetcafe.org/npc/doc/ups-faq.html#0405 [jetcafe.org] see 04.05

    Would you put a space heater on a UPS? Shredder? Vacuum? Table Saw? If you put a laser printer on it, you may as well.

  • by raddan ( 519638 ) on Wednesday July 23, 2008 @01:30PM (#24307401)
    Actually, UPS devices are useful for other kinds of things as well. Need to distribute load more evenly across your circuits? If you have the machine plugged into a UPS, you simply unplug the UPS and plug it into the other circuit. Heck, you could even do something really dumb like physically move the machine while it's running if you had it connected to a UPS.
  • by Darkk ( 1296127 ) on Wednesday July 23, 2008 @01:31PM (#24307419)

    I 100% agree with the idea of testing under controlled conditions. The oops you guys discovered is a good thing to be caught early on. I can imagine the look on your support team's faces when the UPS suddenly turned itself off while the remaining servers still trying to perform a safe shutdown. I'm sure the secondary UPS was left running as a precaution until the test is successful.

    I have seen a screw up where somebody cut into a live power cord thinking it was a tie wrap caused a major short in the PDU. The guy thought he was safe until he discovered whoever installed the servers didn't double check the power connections and loads so it created a cascade failure in several racks and lost several tons of data. Recovery took awhile.

    Least to say it was not a good day.

  • by E-Lad ( 1262 ) on Wednesday July 23, 2008 @01:31PM (#24307425)

    ...by design. TFA doesn't delve into too much detail, but a sudden power loss on such software RAID systems is a condition that ZFS accounts for. Its Copy-on-write (COW) and write-length stiping strategy prevents things such as the RAID5 write hole [sun.com] condition, a condition that has the biggest chance of occurring when a power loss event happens.

  • by Forge ( 2456 ) <kevinforge@@@gmail...com> on Wednesday July 23, 2008 @01:33PM (#24307473) Homepage Journal

    Last night we had a power outage. I shut down the desktop and was able to continue working for almost 2 hours on the laptop because with the Desktop down the UPS was only carrying the DSL router and the WiFi box.

    At work. Power is a whole enterprise within the company I work for.

    Dual gas powered Generators at each location, Rooms full of Batteries for the Telecoms gear (most is straight DC) and Inverters for the Servers. (DC PSUs are available for some of the servers we use but at so high a premium that the inverters are cheaper.)

    We can handle a dozen Power cuts in a day with no service interruption or data loss ("Tested" 2 weeks ago) and we can stay up without external power for more than a week. After that we have to start trucking in additional diesel.

    Yep. That's right. With sufficient fuel we can be online indefinably. Which we will have to do if we get hit by a major hurricane.

    Which means the phone network is a lot more reliable than the Power grid where I live.

    As for Data loss. I have over the years done a lot of recovery work. "Morfy" of "Murfy's Law" fame isn't a guy or a girl. He is a deamon from the darkest pits of hell sent to torment the souls of IT workers everywhere.

    Imagine a server, where UPS #2 is down for repairs, UPS #1 fails during a power cut, When everything comes back up we find 2 failed hard drives in the RAID 5 on the email server.

    despite previous testing and confirmation that the backups work the most recent tapes failed to read.

    Eventually we sent the failed drives off to a Data recovery company in Florida because

    #1. The customer can afford it.
    #2. Simply "skipping" a few days of Email is not an option for a bank (hence the ability to afford data recovery).

    So yeah. A UPS is essential. Just like RAID, Clustering and Backups but in the end it can all fail.

    Best advise? Memorize all your important data. That way if you loose your mind, you are not responsible for the lost Data (or anything else).

  • by rwa2 ( 4391 ) * on Wednesday July 23, 2008 @01:34PM (#24307481) Homepage Journal

    UPS units are relatively cheap, it's well worthwhile to invest in one, not just to protect from data loss:

    * Hardware loss: I've seen a lot of hardware blown up from power interruptions. Do you trust your power company that much to provide clean power to you? Sure surge protectors help a bit, but a decent UPS costs maybe twice as much as a good surge protector.

    * Time lost restoring your session after blackouts / brownouts: OK, maybe you're used to restarting your computer every morning anyway. But I like to leave things open and return to my desktop just the way I left it arranged.

    * Stats: Using NUT and Munin, you get to monitor and log your power, so you can see things like exactly when your electricity went out and for how long, what load your PC is drawing after that last upgrade, etc. e.g.: http://hairball.bumba.net/cgi-bin/nut/upsstats.cgi?host=apc@localhost [bumba.net]

    * Graceful shutdown: you have a chance to tell your buddies that your power just went out, and you'll be coming back once it's restored.

    Frankly, I'm a little surprised a backup battery isn't built into PC power supplies already, so they'd work a bit more like laptops. Same with networking gear.

  • by bruceg ( 14365 ) on Wednesday July 23, 2008 @01:36PM (#24307537) Homepage

    been there, and done that! We recently moved a few servers this way. Just be careful, and go slow.

  • Our Tandem (Score:5, Interesting)

    by PIPBoy3000 ( 619296 ) on Wednesday July 23, 2008 @01:37PM (#24307541)
    This reminds me of my favorite power loss story. The facility was doing a generator test, where we were supposed to switch over from city power to the generator. Unfortunately it didn't happen smoothly and the UPS kicked in. Sadly it turned out that so many servers had been added since the original design, the UPS was really only good for fifteen minutes or so. The final problem was that our operator didn't notice the issue quickly enough and so the next thing everyone in IT knew is that our main data center just lost power.

    We spent most of the day getting our servers back up from various states of disrepair (confirming the article, power loss is superbad). It turns out that our main medical software ran on a Tandem. Though the drives and such lost power, the CPU had a backup of D-batteries and survived the power loss just fine. Needless to say, we stopped making fun of their seemingly primitive emergency backup power.
  • by natoochtoniket ( 763630 ) on Wednesday July 23, 2008 @01:40PM (#24307599)

    The problem is that different applications systems have different amounts of stats that must be saved. An RT app usually only has a memory buffer that can be written in a small number of IO's. Many business apps have relatively lots of data, in non-contiguous buffers, that require hundreds of IOs to store. Many business systems have hundreds of such apps running in the machine at the same time. Some systems can have gigs of data, in thousands of buffers, in their write-behind cache. And, some businesses have systems that must not shut down, except for actual emergencies like fire or flood.

    How does the hardware designer of a general-purpose computer guess what kinds of apps will run in that machine? He/she cannot.

    The external power supply (aka, the UPS) can be configured to accommodate the needs of the application. An application that needs lots of power for a long time can be configured with a big UPS. And, an app that doesn't need it, doesn't have to pay for it.

  • by Darkk ( 1296127 ) on Wednesday July 23, 2008 @01:51PM (#24307837)

    It's human nature. We tend to not think about the future until "oh shit" happens THEN something is done about it. It happens with everything these days.

    Years ago UPS used to be very expensive item and was not the norm for home user to actually own one. Now it's becoming more affordable but the same users who couldn't afford a UPS back then think, "Well, I've been without one for years so why should I need one now?". Same logic applies to what I said above.

  • by JesseL ( 107722 ) on Wednesday July 23, 2008 @01:53PM (#24307889) Homepage Journal

    I think you're making it more complicated than it needs to be.

    If the system gets a signal that power is going away very, very soon, drops everything else, and just devotes its last seconds to getting things in order - it should be doable in a few seconds and be vastly preferable to the alternative of just having power go away without warning.

    Obviously a UPS is an even better option, but it's not every place that could use a UPS is ever going to get one and it would be good if we could work on the problem from the other end too. Most PCs and casual servers are way more vulnerable to momentary power outages than they ought to be. 10-20 Farads worth of 5V caps and some thoughtful programming would make things a lot less delicate.

  • by Anonymous Coward on Wednesday July 23, 2008 @02:00PM (#24308043)

    I worked for a respectable insurance company. The other day a "well-known" H/W maker came to our place to upgrade the hardware for a mainframe, in our computer room.

    They unscrewed the mainframe's panels and put them aside, on the large thingy right beside it.

    That thingy aside happened to be the UPS, which started to heat up, having its vents blocked by the panels. At some point, it gave up, sending a massive "shutdown now" command to all connected computers, including most of the web infrastructure...

    It's been more that 2 days now, and we are still struggling to bring all the pieces together...

  • Re:Get a UPS (Score:3, Interesting)

    by GuldKalle ( 1065310 ) on Wednesday July 23, 2008 @02:14PM (#24308285)
    Depends on where you live. Here in Denmark I've only experienced two power outages in my lifetime. One was in a house in the middle of nowhere, during a winter storm, the other was due to an unpaid bill. Under those circumstances I've got a lot of other stuff to spend 100$ on.
    If we were talking about a datacenter, then yes, UPS on everything important. But for home use, nah.
  • by Evro ( 18923 ) * <evandhoffman AT gmail DOT com> on Wednesday July 23, 2008 @02:35PM (#24308691) Homepage Journal

    That's why any datacenter worth putting your servers in pipes its power through a flywheel or some other electricity "cleaner". A 1-ton lead ball spinning at 10,000 RPMs isn't going to speed up that much on a spike like that.

  • Mmmm! Puppies!!! (Score:5, Interesting)

    by Gription ( 1006467 ) on Wednesday July 23, 2008 @03:20PM (#24309381)
    Less filling but tastes great!


    Ok back on subject
    A UPS isn't even a panacea... I had a server lose 3 out of 4 HDs in a 4 hour period. (The 3rd drive went at 4:57 PM Thursday Dec 11th 1997. Not that I would remember...) When I looked at the service history on it it had been losing drives for 8 months at an accelerating rate.

    Turns out that the 3000va rack mount wonder UPS from that big, well known vendor was the problem. The switching unit in it was sending spikes into the equipment.

    They wouldn't warranty it so I ended up putting a Triplite ISObar surge suppressor between it and the server in our test environment and it was in service for years after that.

    Never trust any piece of equipment...
  • Re:Ah, that's easy (Score:4, Interesting)

    by rcw-work ( 30090 ) on Wednesday July 23, 2008 @04:01PM (#24309959)

    All you need to do is have the grid power feed some high wattage light bulbs. And near the light bulbs is some solar cells.

    You now have a 1% efficient power supply.

    A slightly more practical option (with better isolation than a standard electromagnetic transformer, but unfortunately also some inductive effects) would be to couple two motors with an insulative shaft.

  • Re:Mmmm! Puppies!!! (Score:3, Interesting)

    by Gription ( 1006467 ) on Wednesday July 23, 2008 @09:44PM (#24313709)
    I have had hundreds of APC UPSes that never had a problem. The one that ate my server just happened to be the one for the core database running a mail order company... 14 days before Christmas. At that point we were doing $90k a day.

    The reason I remember the exact minute it failed was I had my bag in hand and was walking toward the door when the server alarm went off.

    18 hours later I found out from the backup software vendor that there was a bug in the software that meant it wouldn't restore any rights information so the server configuration was totally lost.

    The backup that saved us was a DOS batch file that copied everything down to a PC. 43 hours later I was able to actually go home.

    After the blowup they finally approved the request for a secondary server.
  • by Bender0x7D1 ( 536254 ) on Wednesday July 23, 2008 @10:09PM (#24313917)

    I agree with you.

    My point was that just because a battery can power a laptop for several hours doesn't mean a single battery can supply a server for 5 minutes. So, the GP was claiming that because: (laptop power consumption) * (2-3 hours) == (server power consumption) * (5 minutes) it shouldn't be hard for the same battery to power both. The point I was trying to make is that a device that provides a certain range of performance, (in this case the car at 70 MPH), doesn't mean it is easy for it to perform well outside that range, (operating at 420 MPH).

This restaurant was advertising breakfast any time. So I ordered french toast in the renaissance. - Steven Wright, comedian

Working...