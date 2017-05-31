British Airways Says IT Collapse Came After Servers Damaged By Power Problem (reuters.com) 39
A huge IT failure that stranded 75,000 British Airways passengers followed damage to servers that were overwhelmed when the power returned after an outage, the airline said on Wednesday. From a report: BA is seeking to limit the damage to its reputation and has apologised to customers after hundreds of flights were canceled over a long holiday weekend. The airline provided a few more details of the incident in its latest statement on Wednesday. While there was a power failure at a data center near London's Heathrow airport, the damage was caused by an overwhelming surge once the electricity was restored, it said. "There was a total loss of power at the data center. The power then returned in an uncontrolled way causing physical damage to the IT servers," BA said in a statement. "It was not an IT issue, it was a power issue."
Pretty sure UPS's and backup power supplies kinda do fall under that...
Not to mention fail over to alternative sites.
These are transparent lies. The real issue is well known now, but it's unconformable for all involved so they're making stuff up.
Well, India has a notoriously unreliable electrical grid.
We all know that this outage was caused by bad faith outsourcing to unqualified persons. Who are they kidding?
https://www.theguardian.com/bu... [theguardian.com]
Oh yeah, power surges are to blame! haha no.
A proper IT staff would have built in safeguards against power outages and power surges.
For a company the size of British airways I would expect that they would have a hot fail over in a different country. Or at least a different geographic location.
In short they cheeped out on IT and now they are paying for it.
This is what happens when you treat your IT staff like your Janitorial staff.
An ill-considered plan to save a few dimes has cost them several dollars.
The CEO should have foreseen this and should be let go. As should other executives who approved the offshoring plan.
Offshoring can work- but excessive staffing cuts to save a few extra dollars are begging for something like this to happen.
Infrastructure people should be located on site with the hardware and there should be multiple hardware systems *with* fail over testing on a monthly basis. (not quarterly. that fails. only monthl
Yep.
We have a Caterpillar generator the size of a schoolbus (and given its coloring I've had to restrain myself from sticking a stop-sign on the side as a prank) and a sophisticated transfer switch with power monitoring. When we lose power the batteries hold the DC over until the generator kicks in, and then when power is restored we do not switch back to grid immediately. I am not the person that deals with the power, but as I understand it, the generator and transfer switch monitors the grid for some ti
A proper IT infrastructure can deal with a direct lightning strike as well.
How big a current spike was this?
1.21 Jiggawatts, and it sent them back to 1985.
Great Scott!!
While many industrial UPS systems are dual conversion systems (essentially, the critical load is powered from the battery bus/inverter, and fails over to mains in the event of an inverter/battery malfunction), they are sometimes operated in standby mode (the critical load is powered from mains, and fails over to the battery bus/inverter in the event of a mains failure) as this saves energy due to
The power surge was the direct cause. The fundamental cause was the failure of management to ensure they had an appropriate disaster recovery plan.
And the difference is...?
Anyone whose Server Farm can be brought down from a power outage does NOT know what they are doing, or care enough about it to bother.
How would this 'admission' make anyone more comfortable about this business?
Power issues of this kind are IT issues.
When designing a server location you must take power into consideration; ie, do I have enough battery to keep all critical servers and supporting hardware up until the generator has kicked in, plus extra just in case the generator has a glitch or two of it's own. Is the battery rated at the correct surge protection to keep systems from glitching when the power does return? Is the generator more than enough to power everything between re-fuelings? Is the generator r
Okay, they weren't flaming incompetents that didn't have a failover site. They were flaming incompetents that had a failover site that didn't work, because apparently they never tested it. Glad we cleared that up.
It's amazing how many 'power' issues there are with remote Indian support centers. If it truly is the power issues, way are aren't there rigorous disaster plans because these power issues are so common. If they are in place, and they still aren't helping, then why are building these data centers / support centers there anyway? If the country has an unstable power grid or is prone to natural disaster that cause issues, once again, why are there data centers there in the first place?
It sounds like someone is
was the fact that you apparently have no redundancy on extremely mission-critical servers.
Every server wasn't connected to a UPS? And the return of the power overwhelmed the UPSes?
And just how did management decide to "save money" on the power for the servers?