Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Transportation Power IT Technology

British Airways Says IT Collapse Came After Servers Damaged By Power Problem (reuters.com) 189

A huge IT failure that stranded 75,000 British Airways passengers followed damage to servers that were overwhelmed when the power returned after an outage, the airline said on Wednesday. From a report: BA is seeking to limit the damage to its reputation and has apologised to customers after hundreds of flights were canceled over a long holiday weekend. The airline provided a few more details of the incident in its latest statement on Wednesday. While there was a power failure at a data center near London's Heathrow airport, the damage was caused by an overwhelming surge once the electricity was restored, it said. "There was a total loss of power at the data center. The power then returned in an uncontrolled way causing physical damage to the IT servers," BA said in a statement. "It was not an IT issue, it was a power issue."
This discussion has been archived. No new comments can be posted.

British Airways Says IT Collapse Came After Servers Damaged By Power Problem

Comments Filter:
  • by Anonymous Coward on Wednesday May 31, 2017 @11:03AM (#54518371)

    Pretty sure UPS's and backup power supplies kinda do fall under that...

    • Redundant System (Score:5, Insightful)

      by Roger W Moore ( 538166 ) on Wednesday May 31, 2017 @11:15AM (#54518497) Journal
      Even if UPS and surge protection do not count, having a redundant system in a different data centre ready to take over regardless of the cause of the outage definitely does fall under IT. It is insane that a major company like BA did not have any such redundancy for such an important, mission critical application. It would have cost far less than the £100 million estimated cost of this incident not to mention avoiding the appalling publicity.
      • by tysonedwards ( 969693 ) on Wednesday May 31, 2017 @11:24AM (#54518573)
        Come on... It's apparent, the power surge was so severe it crossed the VPN Tunnels when they re-opened and traveled into another city and damaged those systems too!
      • Re:Redundant System (Score:5, Interesting)

        by GameboyRMH ( 1153867 ) <gameboyrmh@@@gmail...com> on Wednesday May 31, 2017 @12:41PM (#54519307) Journal

        This. The BA outage is the second most hilariously inept cause of an outage I've ever seen, after a local government office that was down for over a week because one rackmount server was dropped in transit.

        • What about the RBS banking outage?

          https://en.m.wikipedia.org/wik... [wikipedia.org]

        • by gweihir ( 88907 )

          "most hilariously inept" covers it well.

        • I know a utility that saved a little money on the power that ran the electric natural gas pumps for their biggest generator (pumps were in another utilities area) by putting the pumps on a curtailable contract (like 'peak corps' and other programs, that cut off your AC power when demand is highest).

          So that means, when the demand was highest, that utility shutdown the supply pumps to PG&^H^H^H a large utilities biggest generator. Brilliant!

      • I would give 10:1 odds that they had a voltage dip, transferred to generator and failed coming back because their batteries were no good. Unclear if they lost power once or twice, or if it was the servers auto restarting, but the kind of damage they allude to typically is when you are 7 years into your "10-year" VRLA batteries. Also known as cost cutting...
        • Doesn't everyone use closed transition (make before break) transfer switches these days? Failing that even with shit batteries I'd think a break before make transfer switch should be able to be absorbed by weak batteries on a double conversion UPS, or by power supplies on the computer hardware running with UPS in bypass (or a standby UPS).

          I have seen the following oddities with emergency power devices (oddly all Eaton):
          ~10 years ago an Eaton(IIRC) closed transition transfer switch with a firmware bug. There

        • The powercos in the area have come out and categorically stated there was no form of power hit, dip or other problem on the public side of the meters.

      • by Afty0r ( 263037 )

        It would have cost far less than the £100 million estimated cost of this incident

        I agree that they should do it, but it is unlikely that the one-off cost of implementing always-on redundant systems would be this cheap, the scale and scope of the IT systems involved in the airline industry is enormous and it's likely it would cost significantly more than that. There are also ongoing costs to consider. Source: Work in software development, have seen projects in organisations way smaller and simpler t

      • Ticketing and scheduling systems are not life-safety critical, therefore they don't get the budget for double redundancy. It's aerospace-think come to the company comptroller's office, imposed on IT that made this failure happen.

        Also, that "£100 million estimated cost of this incident" is less than the development, rollout and ten years of additional maintenance costs of a full double redundant geographically diverse scheduling system. For an industry that can't even make a baggage sorting system wo

        • "Ticketing and scheduling systems are not life-safety critical"

          Loading, aircraft balancing (centre of gravity) and fuel load calculations ARE.

          All of these were affected. Plus BA's entire VOIP system.

      • by gweihir ( 88907 )

        It is pretty clear that BA leadership screwed up massively here and yes, it is most decidedly an IT problem. The described power-outage scenario is a complete standard one and competent planning prepares for it. Now they are trying to misdirect (i.e. lie) in order to make it appear like this was a natural disaster and of course, they could not have done anything about that. Dishonorable, untrue, but nicely demonstrates the defective characters of the people in power at BA.

        The only right thing to do is kick

    • by Tailhook ( 98486 ) on Wednesday May 31, 2017 @11:17AM (#54518513)

      Not to mention fail over to alternative sites.

      These are transparent lies. The real issue is well known now, but it's unconformable for all involved so they're making stuff up.

    • by sycodon ( 149926 ) on Wednesday May 31, 2017 @11:25AM (#54518585)

      Well, India has a notoriously unreliable electrical grid.

    • In other words: "We used $10 MILLION WORTH OF EXPENSIVE SERVERS like a CHILD would use a PAPER CLIP IN AN ELECTRIC SOCKET."
    • When they _fire_ the CEO, CTO and Director of IT. They should publicly announce 'It wasn't a management issue, it was power.'

    • by Joe_Dragon ( 2206452 ) on Wednesday May 31, 2017 @11:51AM (#54518875)

      it's not our DC so we don't deal with the power part it's the DC that we outsourced to that does the power part.

    • That was my first thought. A setup that size would have to have UPS backup setups upon backups. Baloney.
    • I am pretty sure it was the lack of in this case. Even if a power surge happens, PDU/UPS pretty much handle any power related issues. This sounds more like someone was dinking around in the data center and pulled/shorted the wrong wire(s). Even if this did not happen, PDU/UPS equipment was designed to prevent what happened, so yea it WAS AN IT PROBLEM.

  • by mfh ( 56 ) on Wednesday May 31, 2017 @11:04AM (#54518377) Homepage Journal

    We all know that this outage was caused by bad faith outsourcing to unqualified persons. Who are they kidding?

    https://www.theguardian.com/bu... [theguardian.com]

    Oh yeah, power surges are to blame! haha no.

    • Pound?
    • by jellomizer ( 103300 ) on Wednesday May 31, 2017 @11:13AM (#54518481)

      A proper IT staff would have built in safeguards against power outages and power surges.
      For a company the size of British airways I would expect that they would have a hot fail over in a different country. Or at least a different geographic location.

      In short they cheeped out on IT and now they are paying for it.

    • by Maxo-Texas ( 864189 ) on Wednesday May 31, 2017 @11:26AM (#54518599)

      An ill-considered plan to save a few dimes has cost them several dollars.

      The CEO should have foreseen this and should be let go. As should other executives who approved the offshoring plan.

      Offshoring can work- but excessive staffing cuts to save a few extra dollars are begging for something like this to happen.

      Infrastructure people should be located on site with the hardware and there should be multiple hardware systems *with* fail over testing on a monthly basis. (not quarterly. that fails. only monthly is often enough that the failover is seamless and there is a good argument for doing a daily failover.)

      • BIG DC power systems are not really IT guys more like Infrastructure / electricians and some of that stuff is not easy swap even more so if an fail safe tripped and killed all power.

        • by Maxo-Texas ( 864189 ) on Wednesday May 31, 2017 @03:14PM (#54520489)

          It is if it is set up and administered right.

          we did monthly failovers between different physical sites. A blown DC at one site wouldn't have made a difference.

          Our failovers involved a couple hours of oncall for about 150 staff. Most the time only a half dozen were working but a couple times a year it would involve most the staff (and a lot of it people) for part of that. A database would be out of sync or messed up and that would fall to the IT staff to fix. It became less common over time.

          Did you miss that they fixed the power problems and then the IT systems were messed up for a long time afterwards indicating poor disaster planning and low staff skill.

          A company as big as BA, should have had a separate failover site and been doing regular failovers.

      • by rikkards ( 98006 )

        This is what happens when you outsource, you lose control of what is outside your grasp and you take them at face value.
        CEO who did the right thing would have been pushed out because he was costing the shareholders too much money before that

    • Why do you think it took them this long to come out with an explanation?

  • by Anonymous Coward

    "It was not an IT issue, it was a power issue."

    Assuming it was not a lightning strike, It's still your fuckup if "power issues" can damage/take down your IT.

    • by TWX ( 665546 ) on Wednesday May 31, 2017 @11:19AM (#54518531)

      Yep.

      We have a Caterpillar generator the size of a schoolbus (and given its coloring I've had to restrain myself from sticking a stop-sign on the side as a prank) and a sophisticated transfer switch with power monitoring. When we lose power the batteries hold the DC over until the generator kicks in, and then when power is restored we do not switch back to grid immediately. I am not the person that deals with the power, but as I understand it, the generator and transfer switch monitors the grid for some time before switching back to grid, and there are power conditioners in between. On top of that, the system monitors grid power continuously and will intentionally island the system if there's a significant enough fault.

      This is not for something as critical as an airline's control system either. I do not find any reasonable excuse to blame power; you're supposed to assume that power is dirty and unreliable and to work around it.

      • by Anonymous Coward on Wednesday May 31, 2017 @11:40AM (#54518775)

        Sounds great...when it works. I bet you've never looked at the code that controls a big automated transfer switch. I have. It's a mess. It's so bad that the very first install Eaton did with our new model, which was in Digital Forest in Tukwila, WA near Seattle, we had three failures in the first ninety days due to bad software. It shut an entire data center down even though utility power was not down, battery power good, and generator working. The guy we dispatched the third time had spent two years in Uganda so he was experienced with bad power. He claimed that power from Seattle City Light was worse than Uganda. The power was so bad that the software in the ATS decided to disconnect everything.

        The second time power was restored, because of the bad software, it switched to generator power before the generator was running fully. The voltage dropped and took out quite a few older pieces of equipment and stalled the engine. In other words, the opposite problem BA had.

        • The guy we dispatched the third time had spent two years in Uganda so he was experienced with bad power. He claimed that power from Seattle City Light was worse than Uganda. The power was so bad that the software in the ATS decided to disconnect everything.

          Probably true. When the first grid-tie inverters were invented, they kept shutting themselves off because as it turned out, the utilities were totally incapable of producing power as clean as they claimed they were, and as they were demanding that the inverter provide. Making better power than utilities in the US is trivial.

        • by phorm ( 591458 )

          Strange, in the last place I worked with a big DC, they regularly tested the generator (I think monthly, and even from floors away you could *hear* it), and UPS systems. In my five years there, I'd not heard of an outage due to any of the many power failures in our area.

      • by Anonymous Coward on Wednesday May 31, 2017 @12:45PM (#54519349)

        I worked in a center that had a big diesel-powered UPS unit the size of a shipping container. It was there about 3 years before we had a power outage. It detected it and span up, engaged the clutch and ... the drive belt snapped. Oops. Under voltage. So rev faster. Still undervoltage, so MOAR revs. Now, in addition to the power outage we've got a big UPS that's on fire.

        • Sounds like 365 Main, the problem was in multiple small blips which were each too small to start the engine, but in aggregate depleted the flywheel below the minimum to start the engine.
        • by TWX ( 665546 )

          We test monthly. It's also a way to replenish the fuel before it becomes nonviable.

          • by dbIII ( 701233 )
            Indeed - good idea.
            Sometimes Murphy is still against you.
            A power station I did some work at had a 20MW emergency generator (old jet engine) to kick things off (conveyors and crushers require a lot of juice) and it was tested monthly for around 25 years and maintained carefully. The only time it was needed (due to a fairly rare set of circumstances) it didn't work. A second one was installed later as a backup to the backup but neither was needed again for the remaining life of the power station.
            I think tho
      • by citylivin ( 1250770 ) on Wednesday May 31, 2017 @01:06PM (#54519561)

        Until your voltage regulator starts dying and only gives your equipment 80volts and no one notices the under voltage condition during normal maintenance and testing of the generator.

        The facilities maintenance people test the generators monthly, but it was not standard practice to test the voltage every single time the generator was tested.

        It is now.

        But the point is that systems fail in all sorts of fun ways in the real world. You learn, you change, you adapt, as im sure BA is doing. All it takes is one major incident to stop people from dragging their feet. I'm sure that is occurring now at british airlines.

        • by dbIII ( 701233 )

          But the point is that systems fail in all sorts of fun ways in the real world. You learn, you change, you adapt, as im sure BA is doing.

          Yes, and by outsourcing to someone who has not learned your lessons you have to get through all those mistakes a second time.

      • by amorsen ( 7485 )

        [..]sophisticated transfer switch with power monitoring[..]

        Those break. Way more than they should. Often with interesting results that aren't just "power went off".

        And you fundamentally can't make them redundant. You can have two of them on completely separate feeds of course, feeding into different power supplies on the servers. That sometimes helps, except when the overvoltage is sufficiently great to get through the protections of the power supply.

      • by Thelasko ( 1196535 ) on Wednesday May 31, 2017 @01:47PM (#54519841) Journal

        I am not the person that deals with the power, but as I understand it, the generator and transfer switch monitors the grid for some time before switching back to grid, and there are power conditioners in between.

        I used to design the diesel engines used in some of those systems, and have seen them in use. Although your system may monitor the grid to ensure reliability, it's most likely making sure it's not switching between two power sources that are out of phase.

        When we would connect one of our gensets to the power grid, we had to match the phase before we could close the switches. To do this, the engine speed was modified to run the generator at slightly above or below the frequency of the grid. If the phase wasn't matched, the power grid would try to force the generator into phase suddenly. It's assumed the power available from the grid is infinite in these types of systems. Therefore an incredible amount of current would flow through the generator and also provide a mechanical jerk [wikipedia.org] to the engine if the switches were closed out of phase. Something will break in a spectacular fashion if this isn't done carefully.

        Honestly, this could be what happened to BA.

        • by kevmeister ( 979231 ) on Wednesday May 31, 2017 @06:27PM (#54521869) Homepage

          And sometimes **it happens.

          I worked as a Senior Network Engineer for a large national backbone provider to the US DOE. At the facilities we owned WE were in charge of oversight of the power system and regular testing. We had one experienced power engineer on staff to oversee everything, though the facility's plant engineering people did all of the actual heavy work.

          Back in 2009 we had just completed our annual full transfer test where we switched over to UPS, let the generator fire up, transferred to generator power, and then reversed the process. Everything worked perfectly. The following week we lost power. UPS kicked in, but the generator refused to start. One week earlier everything worked perfectly in the test case where we could have backed out before UPS died. No such luck that day. Our staff lost the ability to monitor the network and the laboratory where we were located lost Internet connectivity as did several other smaller facilities in the area. Took us about an hour to get a trailered generator in place and get things back on-line.

          No matter how carefully you plan and test, sometime you still lose.

      • by gweihir ( 88907 )

        Yep.

        We have a Caterpillar generator the size of a schoolbus (and given its coloring I've had to restrain myself from sticking a stop-sign on the side as a prank) and a sophisticated transfer switch with power monitoring. When we lose power the batteries hold the DC over until the generator kicks in, and then when power is restored we do not switch back to grid immediately. I am not the person that deals with the power, but as I understand it, the generator and transfer switch monitors the grid for some time before switching back to grid, and there are power conditioners in between. On top of that, the system monitors grid power continuously and will intentionally island the system if there's a significant enough fault.

        This is not for something as critical as an airline's control system either. I do not find any reasonable excuse to blame power; you're supposed to assume that power is dirty and unreliable and to work around it.

        That is how it is done. It is well-known that power often comes back up "unclean" after a failure.

    • A proper IT infrastructure can deal with a direct lightning strike as well.

      • by nwf ( 25607 )

        A proper IT infrastructure can deal with a direct lightning strike as well.

        At what cost? I doubt it's worth it for most businesses. There are too many disasters to plan for: lightening, flood, earthquake, tornado, high winds, several combined. It's probably impossible to protect against everything unless you have Federal Government money.

        I've yet to see a surge suppression system that's affordable to a mid-scale business that can take a direct hit, anyway. Plus you get EM induced voltage that fries networking and other stuff, including the power system control circuitry. I've seen

      • by gweihir ( 88907 )

        It is in fact a standard scenario.

  • Not an IT Issue (Score:2, Insightful)

    by Anonymous Coward

    It absolutely is an IT issue if you cannot automatically recover from power events in a single data center...

  • Direct cause (Score:5, Insightful)

    by Anonymous Coward on Wednesday May 31, 2017 @11:07AM (#54518423)

    The power surge was the direct cause. The fundamental cause was the failure of management to ensure they had an appropriate disaster recovery plan.

  • So instead of being incompetent at software, they are claiming to be incompetent at hardware.
    And the difference is...?

    Anyone whose Server Farm can be brought down from a power outage does NOT know what they are doing, or care enough about it to bother.

    How would this 'admission' make anyone more comfortable about this business?

    • by Tailhook ( 98486 ) on Wednesday May 31, 2017 @11:39AM (#54518761)

      How would this 'admission' make anyone more comfortable about this business?

      The business doesn't have to worry about that. It's safe regardless; too-big-to-fail public+private yada yada. This is BA we're talking about.

      These "stories" are just the public narrative writing process, guided to affix/deflect blame to/from the appropriate parties as the scapegoats are singled out. The BA execs know they have maybe 72 hours or so before this story falls out of the news cycle so they're using that window to make the headlines they need to muddy the waters. Until now the only narrative that has had any play is the "outsourcing did it" one, and that hits too close to management, so they're making this stuff up and putting it out through their MSM channels.

  • by matthiasvegh ( 1800634 ) on Wednesday May 31, 2017 @11:16AM (#54518505)
    If the power wouldn't have come back at the datacenter, would that still be a power issue? If an earthquake destroys the datacenter is that an earthquake issue? If your system collapses when a datacenter goes offline (for whatever reason), you're at fault, not the datacenter. This seems like a classic case of having a single point of failure.
    • by Anonymous Coward on Wednesday May 31, 2017 @11:26AM (#54518611)

      BA has a DR site independent of the primary that suffered the power issue. But volume groups were not being mirrored correctly to the DR site. When they brought the DR site online, they were getting 3 or more destinations when scanning boarding passes. And since the integrity of the DR site was an issue, it could not be used.

      Then the only option is to fix the primary DC, which would have involved installing new servers / routers / switches / etc, configuring them, restoring the data to the last known good state and then bringing it back online. Good luck to anyone trying to deploy new/replacement equipment en masse during the chaos of a disaster. And then restoring data!

      Takes days, not hours... unlike whatever RTO/RPO they claimed to be able to meet.

  • by __aaclcg7560 ( 824291 ) on Wednesday May 31, 2017 @11:23AM (#54518567)
    "Those union electricians told us we could run all these servers without upgrading the circuit breakers. It's not an IT problem, it's a union problem!"
    • by IMightB ( 533307 )

      dont really think you can blame this on unions.... whats your agenda? Most of the time issues like this are caused by management thinking "we already spent X million dollars for server clusters on one site. But it costs X much more for each server to have dual power supplies, then X much more for each DataCenter power bus and redundant backups, then you telling me we have to spend X times 2 for an additional DataCenter?!?!?! and testing etc etc. I thought this is what a HA cluster is for!"

  • by U8MyData ( 1281010 ) on Wednesday May 31, 2017 @11:26AM (#54518597)
    Really? So, they are completely illustrating that their IT efforts are a "cost center" and that IT is a "necessary evil" that they provide minimal effort to. Everyone knows that a serious "Data Center" has multiple protective measures in place, so who is this service provider? I wonder how they treat their aircraft? This is so blatantly obvious it hurts those who know IT. Forget about the outsourcing questions.
    • Re:ID10Ts (Score:5, Insightful)

      by Fire_Wraith ( 1460385 ) on Wednesday May 31, 2017 @11:34AM (#54518703)
      Outsourcing is part of the problem, but you're right, it derives from the mentality that IT is a cost center that must be minimized at every possible turn. It's outdated thinking, going back to the days where if your office network went down, there'd be a bit of inconvenience, but the planes still flew, and it wasn't a big deal. Today, IT is a business critical area, because when your network goes down, the planes stop flying, and you stop making money, never-mind the lingering effects from the terrible publicity or the angry customers. It's not something you can afford to skimp on, on any level.

      Unfortunately it will probably take several shocks like this, and some high level careers ending as a result, before they start to wise up.
  • was the fact that you apparently have no redundancy on extremely mission-critical servers.

  • Every server wasn't connected to a UPS? And the return of the power overwhelmed the UPSes?

    And just how did management decide to "save money" on the power for the servers?

    • And the return of the power overwhelmed the UPSes?

      No, they appear to be saying that turning the power on somehow damaged the computers:
      "The power then returned in an uncontrolled way causing physical damage to the IT servers"

      I don't believe this. The CEO is just protecting his own ass after outsourcing IT.

      • by sconeu ( 64226 )

        No, they appear to be saying that turning the power on somehow damaged the computers:
        "The power then returned in an uncontrolled way causing physical damage to the IT servers"

        Right, which says that the servers weren't connected to UPSes. Because if they were, then the UPS would have filtered the power surge.

        • If I read this right, they are claiming that putting a huge load on the system (bringing up power to too many servers at once) resulted in excessive voltage on the power rails.

          In my understanding of physics, increasing the current usually results in reduced voltage. So where did the over-voltage come from?

          Or are they saying their their UPS generators were somehow incapable of limiting their output voltage? Pretty strange generators, not suitable for the task?

          None of this sounds right, which is why I reject

  • by pz ( 113803 ) on Wednesday May 31, 2017 @11:36AM (#54518725) Journal

    I worked as a dev for a pretty big social network company. We were a not-quite also-ran, peaking at Alexa 108 globally, and for a while we were beating the pants off of Facebook. This was in the pre-AWS days when startups still ran their own servers. Early on, we had apparent power failures on two successive Saturday nights. Right when our database scrubbing processes started.

    I suggested to our sysadmins that *maybe* it was because all of the disk heads were starting to move at once, and *maybe* it would go away if we staggered the processes across servers.

    Yep, problem solved. Our power feeds were rated for average power draw, not peak power draw on all servers in a rack, and peak power came when all of the disks started seeking simultaneously.

    It seems the same thing happened at BA, except no one thought to stagger-start the servers. For us, this was the first big system we ever built, so, OK, chalk it up to growing pains (and the problem never, ever happened again). But BA? Shame on them.

  • Delta tried this [slashdot.org] last year and got called to the carpet on it. BA needs to learn from other's mistakes...
  • by account_deleted ( 4530225 ) on Wednesday May 31, 2017 @12:12PM (#54519047)
    Comment removed based on user account deletion
    • A report I read suggested they had around 500 cabinets of machines (not sure if this was across both sites or the primary). Estimating 2KW/cabinet brings you into MW territory for the lot, so this is a non-trivial amount of machinery to keep running in a power failure situation. The failure description suggested that it was a surge issue, so it's not clear if this was just stupids on their behalf (not staggering restart) or something else going wrong within the site (bad failure to generators etc).

      Either wa

      • by Shimbo ( 100005 )

        However, their DR planning did get them running again within a few days which is more than most companies can manage.

        Most companies wouldn't manage to recover in a few days from an actual disaster. However, all that seems to have happened is that they fried a few servers. Doesn't take a lot of planning to get some spares in and recover some toasted machines. Not knocking the guys on the ground, who probably had to work quite hard to do it but trying to fixup the primary site because the failover was dysfunctional is no evidence at all for a good DR plan.

        Also, we don't know where the surge came from, or how it was able to

    • "I was able to protect my puddly shit at my workplace with equipment I bought at Frys, so BA should have been able to protect its 12,000 servers just like I did."

      Scaling up is hard. Just because you were able to do it with your install doesn't mean it would be just as easy for a larger install.

      That said, they should have done a better job at BA. Even though testing power isn't part of a smaller DC's MO, it should be for a company the size of BA...at least in their dev environment.

  • Seriously, I have not seen so many issues in Airline computers except for the last 2 years. What is different? Why outsourcing to India.
  • I don't think he's digging his hole fast enough. Feel free to borrow my shovel.

    Or, perhaps a better solution would be for someone else at BA to clonk him over the head from behind with a little statuette or something so he just stops talking.

  • So what would have actually happened?

    First, there is a cut in mains power to the data centre. No biggie, the batteries take the load. The backup generators then start to spin up and then supply power and the datacentre keeps running.
    No lights went off, no computers crashed, business kept running.

    But then, mains power is available again. How do you transition from your own generated power back to grid power? You can't just flick a switch. For a start you should ensure that the phase of the two power sou

    • Because your batteries gave their last gasp to get onto generator, or because the impact event was not the initial loss of power but the thermal damage from restoration of power.

      After people hitting the big red button (or the fire alarm doing it automatically), this is the most common failure mode for a data center.
  • by Anonymous Coward

    The utility providers for all of BA's major operations centers in England are all on record [ft.com] as saying there were no power surges, anomalies, etc. This wasn't "we're unaware of...", they all went back over their logs and categorically denied it (seems like they weren't happy about BA trying to pin any bit of this sh*t show on them). As many have pointed out above and elsewhere, none of this passes the sniff test. BA's taking a beating for this, not just over stranding passengers but how they handled the s

  • As opposed to another type of servers? Do they have building & grounds servers? Operations servers? Receptionist servers?

    Just curious...

  • If a power surge caused the issue, then surely BA will sue the power company. If the power company can demonstrate there was no surge, surely they will sue BA for defamation.
  • And there was no power surge (not outside the DCs anyway).

    A large number of ex-BA IT staff have commented in fora about the historic robustness of the system, however over the last 5 years BA has systematically gutted its IT staff and outsourced just about everything to India.

    The CIO of BA (and IAG) is a manager whose last claim to fame was being the person responsible for ramming through the highly contentious (as in strike-causing) cabin crew contracts which stripped out many rights in 2011.

    He has ZERO IT

"The vast majority of successful major crimes against property are perpetrated by individuals abusing positions of trust." -- Lawrence Dalzell

Working...