- Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
- Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!
Reply to: Clusterf***
Collapse
You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:
- You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
- You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
- If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.
Logging in...
Previously on "Clusterf***"
Collapse
-
Originally posted by NotAllThere View PostI recall when our DB server stopped responding. But the failover wouldn't trigger, because the controller was trying to do a clean shutdown, and couldn't because a process was hanging. So we told our good friend Bob to force a shutdown of the DB server, which Bob was reluctant to do, as it wasn't in the SOP. Eventually, the op manager persuaded him - "Pull the bloody plug out if you have to - JFDI".
Only Bob, didn't. He shut down ALL the servers.
Oh, and naturally, when we tried to bring it all back, the failover db server wouldn't come up. Uhuru just couldn't work it out. Fortunately, Scotty worked out that one of the network cards had failed, knew how to get in via one of the others, and got that back online somewhat faster than the four hours the datacentre were quoting.
13000 users, around the world, unable to log on for an hour. How we laughed.
Leave a comment:
-
I recall when our DB server stopped responding. But the failover wouldn't trigger, because the controller was trying to do a clean shutdown, and couldn't because a process was hanging. So we told our good friend Bob to force a shutdown of the DB server, which Bob was reluctant to do, as it wasn't in the SOP. Eventually, the op manager persuaded him - "Pull the bloody plug out if you have to - JFDI".
Only Bob, didn't. He shut down ALL the servers.
Oh, and naturally, when we tried to bring it all back, the failover db server wouldn't come up. Bob just couldn't work it out. Fortunately, Scotty worked out that one of the network cards had failed, knew how to get in via one of the others, and got that back online somewhat faster than the four hours the datacentre were quoting.
13000 users, around the world, unable to log on for an hour. How we laughed.
Leave a comment:
-
Originally posted by suityou01 View Post
You patronising b******
HTH BIDI
I'm giving a lesson on how to suck eggs later next week. If you have a granny you'd like to enroll?
Leave a comment:
-
Originally posted by MarillionFan View PostWith databases SY it is normally necessary to update the Live server.
But if you do, the process is to backup right before, notify and disconnect all users, take down application services, update, then bring it all back up after you've run through the process on test.
Anyway that's what I made one client attempt to do this week. They're test box is still down because they can't work out how to restart the service. I'm bloody glad i didn't go gung ho and try my changes on live first.
You patronising b******
HTH BIDILast edited by suityou01; 13 November 2010, 12:42.
Leave a comment:
-
With databases SY it is normally necessary to update the Live server.
But if you do, the process is to backup right before, notify and disconnect all users, take down application services, update, then bring it all back up after you've run through the process on test.
Anyway that's what I made one client attempt to do this week. They're test box is still down because they can't work out how to restart the service. I'm bloody glad i didn't go gung ho and try my changes on live first.
Leave a comment:
-
Originally posted by NotAllThere View PostOo do tell. You can change the names to protect the guilty.
Leave a comment:
-
Originally posted by NotAllThere View PostOo do tell. You can change the names to protect the guilty.
1) I took a live backup which failed.
2) I noticed a lock on a table which I cleared.
3) I took the live backup which worked.
Transaction log says
1) I dropped the whole database and tried recreating it, badly. (While the system was live)
2) I failed to repopulate the table in question. (While the system was live)
3) The system limped along like this for 25 hours, and this caused further problems.
4) I then tried to rebuild the table in question again and it worked this time. (While the system was live)
So when we started investigating the data all looked hunky dory, well in the most part anyway.
Leave a comment:
-
Originally posted by suityou01 View PostGood work and thanks for your swift reply. If the MySQL binary transaction log (exhibit a) records session information then he won't be able to wriggle out of anything. I have not taken "sabbotage" - deliberate or mistaken - out of the equation. It is on my list of possibilies. Transaction log analysis is high up my list of next tasks.
Taxi for the dba
Leave a comment:
-
Originally posted by suityou01 View PostIf the MySQL binary transaction log (exhibit a) records session information then he won't be able to wriggle out of anything. I have not taken "sabbotage" - deliberate or mistaken - out of the equation. It is on my list of possibilies. Transaction log analysis is high up my list of next tasks.
Leave a comment:
-
Good feedback.
To clarify a couple of points :
The area of the system in question has been running for 10 months without problems.
The area of the system in question has had no changes in 10 months.
The guy with amnesia is not a developer. He is support. They (developers) are not allowed near the live system.
This is not a witch hunt. Admittedly the guy with amnesia has pissed me off, but I have brushed that to one side as the only witch I am hunting is the technical root cause.
The system will now be replaced I suspect as the politics dictate but this is not my concern. My concern is to troubleshoot, fix, resurrect and give guarantees. Then they can replace at leisure.
I think given that the people I interviewed have changed their story the only choice I have is to coordinate load tests on the UAT environment and meanwhile do a forensic level check on the transaction logs.
The truth will out.
Stay tuned folks.
Leave a comment:
-
Firstly: check what was delivered and what is at fault
Secondly : check against the business spec
Thirdly: check if system testing and UAT was actually carried out correctly
Fourthly: check if said developer who is "fixing on the spot" delivered to said spec
You will often find that a developer have decided that what was asked and what he delivered were two different things. you may also find that whomever sorted out the system and UAT were a tad lapsed in their testing
If all else fails.
Duck and cover or sit back and laugh.
Leave a comment:
-
Originally posted by xoggoth View PostIn the good old days we just blamed the hardware guys.
Leave a comment:
- Home
- News & Features
- First Timers
- IR35 / S660 / BN66
- Employee Benefit Trusts
- Agency Workers Regulations
- MSC Legislation
- Limited Companies
- Dividends
- Umbrella Company
- VAT / Flat Rate VAT
- Job News & Guides
- Money News & Guides
- Guide to Contracts
- Successful Contracting
- Contracting Overseas
- Contractor Calculators
- MVL
- Contractor Expenses
Advertisers
Contractor Services
CUK News
- Five tax return mistakes contractors will make any day now… Jan 9 09:27
- Experts you can trust to deliver UK and global solutions tailored to your needs! Jan 8 15:10
- Business & Personal Protection for Contractors Jan 8 13:58
- ‘Four interest rate cuts in 2025’ not echoed by contractor advisers Jan 8 08:24
- ‘Why Should We Hire You?’ How to answer as an IT contractor Jan 7 09:30
- Even IT contractors connect with 'New Year, New Job.' But… Jan 6 09:28
- Which IT contractor skills will be top five in 2025? Jan 2 09:08
- Secondary NI threshold sinking to £5,000: a limited company director’s explainer Dec 24 09:51
- Reeves sets Spring Statement 2025 for March 26th Dec 23 09:18
- Spot the hidden contractor Dec 20 10:43
Leave a comment: