NatWest Borked - Contractor UK Bulletin Board

Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:

You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.

NotAllThere replied

8 March 2013, 14:35
People are bloody* wetware, surely.

* After the kicking.
Leave a comment:
NickFitz replied

8 March 2013, 13:59
Originally posted by doodab View Post

In my experience it usually means there was.

If we use the definition of hardware being "the stuff you can kick", then people are hardware.
Leave a comment:
doodab replied

8 March 2013, 13:41
Originally posted by NotAllThere View Post

The point is that "hardware failure" doesn't mean there wasn't some human f.sk up somewhere.

In my experience it usually means there was.
Leave a comment:
vetran replied

8 March 2013, 09:33
Originally posted by Sysman View Post

I read some of the comments about multiple redundancy with a chuckle.

Once you have seen a single UPS, which was just one of many, take out a whole building you see things with a bit more scepticism.

now that shouldn't happen, sack the architect.
Leave a comment:
Sysman replied

8 March 2013, 08:39
Originally posted by darmstadt View Post

This goes to show the incompetence of NatWest IT department. I set-up, support, design, etc. these type of systems (parallel sysplex, geographically dispersed parallel sysplex, pprc, etc.) and this has never happened in any of the systems I've worked on. I actually built the systems that IBM use to design and test this stuff and tested such errors, and worse, and a 24/7 operation works. As people have said, they probably made the mainframers redundant and invested in 'newer' technologies. Most IT departments are run by ******* idiots in the UK (and USA) anyway...

WHS.
Leave a comment:
NotAllThere replied

8 March 2013, 08:39
Originally posted by doodab View Post

are you the NAT in Natwest?

This was not Natwest, and it happened a few years ago. The point is that "hardware failure" doesn't mean there wasn't some human f.sk up somewhere.
Leave a comment:
doodab replied

8 March 2013, 08:35
Originally posted by NotAllThere View Post

I was present at a meeting when the entire SAP system was bobbed. The failover of the db server to another server failed, because the failover monitor could see the db server at one level, but was waiting for a response at another, which it was never getting. The operations manager told the data centre, in India, to switch off the db server. Turn it off. That way the failover monitor would notice the db server was not there, and switch to the shadow db server.

Apparently "Switch the database server off" was too hard a concept for our offshore colleagues to understand, since they instead shut everything down; all the application servers. Rather than users experiencing a hanging system that then started working again, they lost connection and their work in progress.

On restart, the shadow server wouldn't come up. A network card had failed, which would require four hours to be replaced, despite spares being on hand, since no-one in the data centre had any technical knowledge or ability whatsoever. Eventually, a techy guy in the UK managed to talk to the shadow db server over on of its other network cards, persuaded it to ignore the failed card, and so the system was restored.

The announced cause of the outage: hardware failure.

are you the NAT in Natwest?
Leave a comment:
SandyD replied

8 March 2013, 08:06
Originally posted by NotAllThere View Post

I was present at a meeting when the entire SAP system was bobbed. The failover of the db server to another server failed, because the failover monitor could see the db server at one level, but was waiting for a response at another, which it was never getting. The operations manager told the data centre, in India, to switch off the db server. Turn it off. That way the failover monitor would notice the db server was not there, and switch to the shadow db server.

Apparently "Switch the database server off" was too hard a concept for our offshore colleagues to understand, since they instead shut everything down; all the application servers. Rather than users experiencing a hanging system that then started working again, they lost connection and their work in progress.

On restart, the shadow server wouldn't come up. A network card had failed, which would require four hours to be replaced, despite spares being on hand, since no-one in the data centre had any technical knowledge or ability whatsoever. Eventually, a techy guy in the UK managed to talk to the shadow db server over on of its other network cards, persuaded it to ignore the failed card, and so the system was restored.

The announced cause of the outage: hardware failure.

Thanks for the clarifications, hope you didn't post the above from your workstation at work mate !
Leave a comment:
darmstadt replied

8 March 2013, 07:39
Originally posted by ctdctd View Post

Don't be silly Nick, it's a mainframe system with multiple redundancy according to Elreg.

I think there were two cleaners with vacuum cleaners.

This goes to show the incompetence of NatWest IT department. I set-up, support, design, etc. these type of systems (parallel sysplex, geographically dispersed parallel sysplex, pprc, etc.) and this has never happened in any of the systems I've worked on. I actually built the systems that IBM use to design and test this stuff and tested such errors, and worse, and a 24/7 operation works. As people have said, they probably made the mainframers redundant and invested in 'newer' technologies. Most IT departments are run by ******* idiots in the UK (and USA) anyway...
Leave a comment:
suityou01 replied

8 March 2013, 07:22
Originally posted by Platypus View Post

Since he tailored his CV
Leave a comment:
NotAllThere replied

8 March 2013, 06:33
Originally posted by NickFitz View Post

It was "hardware failure" apparently: RBS Says Computer Failure Is Unrelated to Last Year

They really should tell the cleaner where it's safe to plug that vacuum cleaner in.

Edit: this is not the explanation for the Natwest failure. This was a different company a few years ago.

I was present at a meeting when the entire SAP system was bobbed. The failover of the db server to another server failed, because the failover monitor could see the db server at one level, but was waiting for a response at another, which it was never getting. The operations manager told the data centre, in India, to switch off the db server. Turn it off. That way the failover monitor would notice the db server was not there, and switch to the shadow db server.

Apparently "Switch the database server off" was too hard a concept for our offshore colleagues to understand, since they instead shut everything down; all the application servers. Rather than users experiencing a hanging system that then started working again, they lost connection and their work in progress.

On restart, the shadow server wouldn't come up. A network card had failed, which would require four hours to be replaced, despite spares being on hand, since no-one in the data centre had any technical knowledge or ability whatsoever. Eventually, a techy guy in the UK managed to talk to the shadow db server over on of its other network cards, persuaded it to ignore the failed card, and so the system was restored.

The announced cause of the outage: hardware failure.
Leave a comment:
doodab replied

7 March 2013, 22:01
Originally posted by ctdctd View Post

Don't be silly Nick, it's a mainframe system with multiple redundancy according to Elreg.

I think they made the people who know how it works redundant.
Leave a comment:
Sysman replied

7 March 2013, 20:00
Originally posted by ctdctd View Post

Don't be silly Nick, it's a mainframe system with multiple redundancy according to Elreg.

I think there were two cleaners with vacuum cleaners.

I read some of the comments about multiple redundancy with a chuckle.

Once you have seen a single UPS, which was just one of many, take out a whole building you see things with a bit more scepticism.
Leave a comment:
Mich the Tester replied

7 March 2013, 19:49
It's a bank; of course it's borked.
Leave a comment:
ctdctd replied

7 March 2013, 19:38
Originally posted by NickFitz View Post

It was "hardware failure" apparently: RBS Says Computer Failure Is Unrelated to Last Year

They really should tell the cleaner where it's safe to plug that vacuum cleaner in.

Don't be silly Nick, it's a mainframe system with multiple redundancy according to Elreg.

I think there were two cleaners with vacuum cleaners.
Leave a comment: