Can Maintenance Make Data Centers Less Reliable? 185
miller60 writes "Is preventive maintenance on data center equipment not really that preventive after all? With human error cited as a leading cause of downtime, a vigorous maintenance schedule can actually make a data center less reliable, according to some industry experts.'The most common threat to reliability is excessive maintenance,' said Steve Fairfax of 'science risk' consultant MTechnology. 'We get the perception that lots of testing improves component reliability. It does not.' In some cases, poorly documented maintenance can lead to conflicts with automated systems, he warned. Other speakers at the recent 7x24 Exchange conference urged data center operators to focus on understanding their own facilities, and then evaluating which maintenance programs are essential, including offerings from equipment vendors."
Security updates (Score:5, Informative)
I can think of many occasions that a security update has broken a server/router/etc. Obviously the lack of a security update can lead to a bigger headache in the future. But the typical user doesn't understand and has the attitude "IT broke the server again".
If a virus or hacker causes an issue the attitude is "I hope they fix that soon. I hate viruses/hackers" (obviously this is a huge generalization).
Maintenance took down Chernobyl (Score:3, Informative)
Re:Maintenance took down Chernobyl (Score:5, Informative)
That being said, it was because their procedures were shit, not because they were doing maintenance.
Actually, no, the Chernobyl disaster was sparked with a 'live' test of a new, untested mechanism for powering reactor cooling systems in the event of a disaster that brought down the power grid. http://en.wikipedia.org/wiki/Chernobyl_disaster#The_attempted_experiment [wikipedia.org] (And even that test was delayed several hours, into a shift of workers that weren't properly prepared to conduct the test.)
Re:In between maybe? (Score:4, Informative)
===
Back in the early 90s, I inherited from a friend a fear of rebooting, turning off, or performing maintenance on a computer. Half the time he opened the case, the computer would become unbootable or never turn back on.
===
Neither you nor your friend are alone in thinking that:
AD-A066579, RELIABILITY-CENTERED MAINTENANCE, Nowlan & Heap, (DEC 1978) [this used to be available for download from the US Dept of Commerce web site; now appears to be behind a US government paywall (!)]
A more recent summary:
http://reliabilityweb.com/index.php/articles/maintenance_management_a_new_paradigm/ [reliabilityweb.com]
sPh
Re:In between maybe? (Score:5, Informative)