Originally posted by NickFitz
View Post
I was present at a meeting when the entire SAP system was bobbed. The failover of the db server to another server failed, because the failover monitor could see the db server at one level, but was waiting for a response at another, which it was never getting. The operations manager told the data centre, in India, to switch off the db server. Turn it off. That way the failover monitor would notice the db server was not there, and switch to the shadow db server.
Apparently "Switch the database server off" was too hard a concept for our offshore colleagues to understand, since they instead shut everything down; all the application servers. Rather than users experiencing a hanging system that then started working again, they lost connection and their work in progress.
On restart, the shadow server wouldn't come up. A network card had failed, which would require four hours to be replaced, despite spares being on hand, since no-one in the data centre had any technical knowledge or ability whatsoever. Eventually, a techy guy in the UK managed to talk to the shadow db server over on of its other network cards, persuaded it to ignore the failed card, and so the system was restored.
The announced cause of the outage: hardware failure.
Comment