Technical challenge - Contractor UK Bulletin Board

Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:

You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.

AtW replied

1 November 2006, 22:09
Keep faith old chap, fortune favours prepared minds.
Leave a comment:
MarillionFan replied

1 November 2006, 22:04
Very good AtW. Its all very good, but so far your search engine returns even less matches than www.onlyscanned5sites.com
Leave a comment:
AtW replied

1 November 2006, 21:25
Ok, a bit disappointing not to see any estimates, but I had pretty low expectations in the first place.

Anyway, here are the results of benchmarking (on AMD Athlon x2 3800 - single core used):

0) no caching tricks - search pattern is expected to be evenly distributed
1) 100 mln unique strings of 20 bytes each, data size: 2 GB
2) Indexing takes 30 mins
3) Generated index size is ~810 MB.
4) Running searches for 10,000 randomly selected strings, with 100 runs (total 10 mln searches) results in a sustained performance of ~232,000 (that's thousands) searches per second.

The system supports multiple indices, so it's not like everything is done for 100 mln strings: scalability is perfect because by adding 2nd CPU you will get double the speed, same goes with extra machines which will not offer extra speed but also load balancing and redundancy.

Could you do better than that? I doubt it. Not because I think I am so ****ing amazing, but because low level algorithms used in the system are such that mathematically pretty much perfect: you can't cheat them with probabilistic modelling because searches will be evenly distributed so caching tricks are off.
Leave a comment:
AtW replied

1 November 2006, 20:27
Originally posted by xoggoth

Provided it's on a PDP11/44.

Go on then - so far I have not heard better offers and I am about to benchmark my stuff at 100 mln unique strings index, we shall see now if theoretical calculations were correct...
Leave a comment:
xoggoth replied

1 November 2006, 20:22
I used to be a wiz at estimating hardware requirements in my sales support days so I can help. Provided it's on a PDP11/44.
Leave a comment:
TheOmegaMan replied

1 November 2006, 20:13
Originally posted by AtW

Benchmarking code is done, generating 100 mln unique strings index,actually slightly higher as the app need to remove duplicates that may or may not be present, but in the original task I thought to go easy on you

Hey AtW I can do it 10% quicker than your application. How do I know ? Easy, I am a fking genious and anything I do would be much, much better.
Leave a comment:
AtW replied

1 November 2006, 20:03
Benchmarking code is done, generating 100 mln unique strings index,actually slightly higher as the app need to remove duplicates that may or may not be present, but in the original task I thought to go easy on you
Leave a comment:
darmstadt replied

1 November 2006, 19:57
Have a look at the operating system TPF to do this or IMS running under zOS on a z9 EC 54 way. These systems do these type of things 24/7.
Leave a comment:
sasguru replied

1 November 2006, 19:45
Originally posted by PRC1964

At this point I gave up. There is no point in me living any more. Some fat moustachiod Russian immigrant in a bedsit in Birmingham has the answer to all the worlds problems and it is impossible for anyone to be better than him.

If only I could have such a great life.

Next ....
Leave a comment:
AtW replied

1 November 2006, 19:21
I will have my benchmarks in just a few minutes, hurry up with your own estimates because I will actually get it all working
Leave a comment:
PRC1964 replied

1 November 2006, 19:17
don't worry about getting non-optimal answer as it is pretty much impossible to get it done better than my approach.

At this point I gave up. There is no point in me living any more. Some fat moustachiod Russian immigrant in a bedsit in Birmingham has the answer to all the worlds problems and it is impossible for anyone to be better than him.

If only I could have such a great life.
Leave a comment:
threaded replied

1 November 2006, 19:02
All very well AtW, wondering about the system, but you ought to know: most of us here could've got the client to pay us more to do the same thing.

threaded in "and here endeth the first lesson" mode
Leave a comment:
AtW replied

1 November 2006, 18:51
There is no current system - new system developed by me is totally brand new, which is in any case irrelevant - what is relevant is that you know what the system should do and I want your own estimate on the basis of your own experience.

How would you even approach this problem - use database?

Anything goes - don't worry about getting non-optimal answer as it is pretty much impossible to get it done better than my approach.
Leave a comment:
DaveB replied

1 November 2006, 18:47
Originally posted by AtW

ok, here is what you have: 100 mln unique strings - 20 bytes each

Your job is to create a system that would allow to either confirm that a given string does not exist in the list of those unique strings you have, or, if it does exist, return unique numeric ID that you can associate with each of those unique strings - database equivalent of RowID.

In terms of performance the system should allow for at least 50 searches per second, or in other words one search should not take more than 0.020 sec.

How fast my system performs (writing benchmark code now) is irrelevant since I want to know what kind system would YOU need to build in order to get around that performance. It does not have to be exact, say 10 servers with X GB ram and Y CPUs each using Oracle etc: estimated cost £Z.

MarillionFan: if you can't give me estimate of work then you ain't getting contract in the first place because clearly you would not have a clue how to do execute it in a way that puts my interests first, ie: very cost effective solution.

Current performance is very relavent since we have no idea of what the software is capable of on it's current platform so we have no indication of the loads it is likely to put on the hardware.

If you are getting X% of your desired performance on a particular hardware platform then we can start to predict performance on other configurations taking into account OS overheads, disk speeds etc.

Is the idea of this excersise to show how much cheaper your solution is to an oracle based one?

Anyway, I'm of to do something more interesting with my free time than take on unpaid systems design work
Leave a comment:
sasguru replied

1 November 2006, 18:41
Originally posted by AtW

ok, here is what you have: 100 mln unique strings - 20 bytes each

Your job is to create a system that would allow to either confirm that a given string does not exist in the list of those unique strings you have, or, if it does exist, return unique numeric ID that you can associate with each of those unique strings - database equivalent of RowID.

In terms of performance the system should allow for at least 50 searches per second, or in other words one search should not take more than 0.020 sec.

How fast my system performs (writing benchmark code now) is irrelevant since I want to know what kind system would YOU need to build in order to get around that performance. It does not have to be exact, say 10 servers with X GB ram and Y CPUs each using Oracle etc: estimated cost £Z.

MarillionFan: if you can't give me estimate of work then you ain't getting contract in the first place because clearly you would not have a clue how to do execute it in a way that puts my interests first, ie: very cost effective solution.

Remember what I said earlier about Aspergers Syndrome? This is a classic symptom, not having the basic empathy to understand that your post is quite probably the most boring ever to be posted on CUK ...

If Carlsberg did boring threads, this would be it ...
Leave a comment: