• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

NatWest Borked

Collapse
X
  •  
  • Filter
  • Time
  • Show
Clear All
new posts

    #41
    Originally posted by NickFitz View Post
    It was "hardware failure" apparently: RBS Says Computer Failure Is Unrelated to Last Year

    They really should tell the cleaner where it's safe to plug that vacuum cleaner in.
    Edit: this is not the explanation for the Natwest failure. This was a different company a few years ago.

    I was present at a meeting when the entire SAP system was bobbed. The failover of the db server to another server failed, because the failover monitor could see the db server at one level, but was waiting for a response at another, which it was never getting. The operations manager told the data centre, in India, to switch off the db server. Turn it off. That way the failover monitor would notice the db server was not there, and switch to the shadow db server.

    Apparently "Switch the database server off" was too hard a concept for our offshore colleagues to understand, since they instead shut everything down; all the application servers. Rather than users experiencing a hanging system that then started working again, they lost connection and their work in progress.

    On restart, the shadow server wouldn't come up. A network card had failed, which would require four hours to be replaced, despite spares being on hand, since no-one in the data centre had any technical knowledge or ability whatsoever. Eventually, a techy guy in the UK managed to talk to the shadow db server over on of its other network cards, persuaded it to ignore the failed card, and so the system was restored.

    The announced cause of the outage: hardware failure.
    Down with racism. Long live miscegenation!

    Comment


      #42
      Originally posted by Platypus View Post
      Since he tailored his CV
      Knock first as I might be balancing my chakras.

      Comment


        #43
        Originally posted by ctdctd View Post
        Don't be silly Nick, it's a mainframe system with multiple redundancy according to Elreg.

        I think there were two cleaners with vacuum cleaners.
        This goes to show the incompetence of NatWest IT department. I set-up, support, design, etc. these type of systems (parallel sysplex, geographically dispersed parallel sysplex, pprc, etc.) and this has never happened in any of the systems I've worked on. I actually built the systems that IBM use to design and test this stuff and tested such errors, and worse, and a 24/7 operation works. As people have said, they probably made the mainframers redundant and invested in 'newer' technologies. Most IT departments are run by ******* idiots in the UK (and USA) anyway...
        Brexit is having a wee in the middle of the room at a house party because nobody is talking to you, and then complaining about the smell.

        Comment


          #44
          Originally posted by NotAllThere View Post
          I was present at a meeting when the entire SAP system was bobbed. The failover of the db server to another server failed, because the failover monitor could see the db server at one level, but was waiting for a response at another, which it was never getting. The operations manager told the data centre, in India, to switch off the db server. Turn it off. That way the failover monitor would notice the db server was not there, and switch to the shadow db server.

          Apparently "Switch the database server off" was too hard a concept for our offshore colleagues to understand, since they instead shut everything down; all the application servers. Rather than users experiencing a hanging system that then started working again, they lost connection and their work in progress.

          On restart, the shadow server wouldn't come up. A network card had failed, which would require four hours to be replaced, despite spares being on hand, since no-one in the data centre had any technical knowledge or ability whatsoever. Eventually, a techy guy in the UK managed to talk to the shadow db server over on of its other network cards, persuaded it to ignore the failed card, and so the system was restored.

          The announced cause of the outage: hardware failure.

          Thanks for the clarifications, hope you didn't post the above from your workstation at work mate !

          Comment


            #45
            Originally posted by NotAllThere View Post
            I was present at a meeting when the entire SAP system was bobbed. The failover of the db server to another server failed, because the failover monitor could see the db server at one level, but was waiting for a response at another, which it was never getting. The operations manager told the data centre, in India, to switch off the db server. Turn it off. That way the failover monitor would notice the db server was not there, and switch to the shadow db server.

            Apparently "Switch the database server off" was too hard a concept for our offshore colleagues to understand, since they instead shut everything down; all the application servers. Rather than users experiencing a hanging system that then started working again, they lost connection and their work in progress.

            On restart, the shadow server wouldn't come up. A network card had failed, which would require four hours to be replaced, despite spares being on hand, since no-one in the data centre had any technical knowledge or ability whatsoever. Eventually, a techy guy in the UK managed to talk to the shadow db server over on of its other network cards, persuaded it to ignore the failed card, and so the system was restored.

            The announced cause of the outage: hardware failure.
            are you the NAT in Natwest?
            While you're waiting, read the free novel we sent you. It's a Spanish story about a guy named 'Manual.'

            Comment


              #46
              Originally posted by doodab View Post
              are you the NAT in Natwest?
              This was not Natwest, and it happened a few years ago. The point is that "hardware failure" doesn't mean there wasn't some human f.sk up somewhere.
              Down with racism. Long live miscegenation!

              Comment


                #47
                Originally posted by darmstadt View Post
                This goes to show the incompetence of NatWest IT department. I set-up, support, design, etc. these type of systems (parallel sysplex, geographically dispersed parallel sysplex, pprc, etc.) and this has never happened in any of the systems I've worked on. I actually built the systems that IBM use to design and test this stuff and tested such errors, and worse, and a 24/7 operation works. As people have said, they probably made the mainframers redundant and invested in 'newer' technologies. Most IT departments are run by ******* idiots in the UK (and USA) anyway...
                WHS.
                Behold the warranty -- the bold print giveth and the fine print taketh away.

                Comment


                  #48
                  Originally posted by Sysman View Post
                  I read some of the comments about multiple redundancy with a chuckle.

                  Once you have seen a single UPS, which was just one of many, take out a whole building you see things with a bit more scepticism.
                  now that shouldn't happen, sack the architect.
                  Always forgive your enemies; nothing annoys them so much.

                  Comment


                    #49
                    Originally posted by NotAllThere View Post
                    The point is that "hardware failure" doesn't mean there wasn't some human f.sk up somewhere.
                    In my experience it usually means there was.
                    While you're waiting, read the free novel we sent you. It's a Spanish story about a guy named 'Manual.'

                    Comment


                      #50
                      Originally posted by doodab View Post
                      In my experience it usually means there was.
                      If we use the definition of hardware being "the stuff you can kick", then people are hardware.

                      Comment

                      Working...
                      X