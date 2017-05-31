Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 


British Airways Says IT Collapse Came After Servers Damaged By Power Problem (reuters.com) 39

Posted by msmash from the no-backup dept.
A huge IT failure that stranded 75,000 British Airways passengers followed damage to servers that were overwhelmed when the power returned after an outage, the airline said on Wednesday. From a report: BA is seeking to limit the damage to its reputation and has apologised to customers after hundreds of flights were canceled over a long holiday weekend. The airline provided a few more details of the incident in its latest statement on Wednesday. While there was a power failure at a data center near London's Heathrow airport, the damage was caused by an overwhelming surge once the electricity was restored, it said. "There was a total loss of power at the data center. The power then returned in an uncontrolled way causing physical damage to the IT servers," BA said in a statement. "It was not an IT issue, it was a power issue."

British Airways Says IT Collapse Came After Servers Damaged By Power Problem

  • Not IT... Riiiight... (Score:1)

    by Anonymous Coward

    Pretty sure UPS's and backup power supplies kinda do fall under that...

    • Even if UPS and surge protection do not count, having a redundant system in a different data centre ready to take over regardless of the cause of the outage definitely does fall under IT. It is insane that a major company like BA did not have any such redundancy for such an important, mission critical application. It would have cost far less than the £100 million estimated cost of this incident not to mention avoiding the appalling publicity.

    • Re: (Score:2)

      by Tailhook ( 98486 )

      Not to mention fail over to alternative sites.

      These are transparent lies. The real issue is well known now, but it's unconformable for all involved so they're making stuff up.

    • Re: (Score:2)

      by sycodon ( 149926 )

      Well, India has a notoriously unreliable electrical grid.

  • We all know that this outage was caused by bad faith outsourcing to unqualified persons. Who are they kidding?

    https://www.theguardian.com/bu... [theguardian.com]

    Oh yeah, power surges are to blame! haha no.

    • Pound?

    • A proper IT staff would have built in safeguards against power outages and power surges.
      For a company the size of British airways I would expect that they would have a hot fail over in a different country. Or at least a different geographic location.

      In short they cheeped out on IT and now they are paying for it.

    • An ill-considered plan to save a few dimes has cost them several dollars.

      The CEO should have foreseen this and should be let go. As should other executives who approved the offshoring plan.

      Offshoring can work- but excessive staffing cuts to save a few extra dollars are begging for something like this to happen.

      Infrastructure people should be located on site with the hardware and there should be multiple hardware systems *with* fail over testing on a monthly basis. (not quarterly. that fails. only monthl

  • How big a current spike was this? Don't UPSes act as surge protectors and filters too?

    • Re: (Score:2)

      by Pascoea ( 968200 )

      How big a current spike was this?

      1.21 Jiggawatts, and it sent them back to 1985.

    • They should do, but it depends a lot on the precise design of the UPS, and the nature of the power transient.

      While many industrial UPS systems are dual conversion systems (essentially, the critical load is powered from the battery bus/inverter, and fails over to mains in the event of an inverter/battery malfunction), they are sometimes operated in standby mode (the critical load is powered from mains, and fails over to the battery bus/inverter in the event of a mains failure) as this saves energy due to

  • Direct cause (Score:1)

    by Anonymous Coward

    The power surge was the direct cause. The fundamental cause was the failure of management to ensure they had an appropriate disaster recovery plan.

  • So instead of being incompetent at software, they are claiming to be incompetent at hardware.
    And the difference is...?

    Anyone whose Server Farm can be brought down from a power outage does NOT know what they are doing, or care enough about it to bother.

    How would this 'admission' make anyone more comfortable about this business?

  • Power issues of this kind are IT issues.

    When designing a server location you must take power into consideration; ie, do I have enough battery to keep all critical servers and supporting hardware up until the generator has kicked in, plus extra just in case the generator has a glitch or two of it's own. Is the battery rated at the correct surge protection to keep systems from glitching when the power does return? Is the generator more than enough to power everything between re-fuelings? Is the generator r

  • If the power wouldn't have come back at the datacenter, would that still be a power issue? If an earthquake destroys the datacenter is that an earthquake issue? If your system collapses when a datacenter goes offline (for whatever reason), you're at fault, not the datacenter. This seems like a classic case of having a single point of failure.

  • It's amazing how many 'power' issues there are with remote Indian support centers. If it truly is the power issues, way are aren't there rigorous disaster plans because these power issues are so common. If they are in place, and they still aren't helping, then why are building these data centers / support centers there anyway? If the country has an unstable power grid or is prone to natural disaster that cause issues, once again, why are there data centers there in the first place?

    It sounds like someone is

  • "Those union electricians told us we could run all these servers without upgrading the circuit breakers. It's not an IT problem, it's a union problem!"
  • Really? So, they are completely illustrating that their IT efforts are a "cost center" and that IT is a "necessary evil" that they provide minimal effort to. Everyone knows that a serious "Data Center" has multiple protective measures in place, so who is this service provider? I wonder how they treat their aircraft? This is so blatantly obvious it hurts those who know IT. Forget about the outsourcing questions.

  • was the fact that you apparently have no redundancy on extremely mission-critical servers.

  • Every server wasn't connected to a UPS? And the return of the power overwhelmed the UPSes?

    And just how did management decide to "save money" on the power for the servers?

