Technical challenge

**sasguru** · 1 November 2006, 19:45

Originally posted by PRC1964

At this point I gave up. There is no point in me living any more. Some fat moustachiod Russian immigrant in a bedsit in Birmingham has the answer to all the worlds problems and it is impossible for anyone to be better than him.

If only I could have such a great life.

Next ....

**darmstadt** · 1 November 2006, 19:57

Have a look at the operating system TPF to do this or IMS running under zOS on a z9 EC 54 way. These systems do these type of things 24/7.

**AtW** · 1 November 2006, 20:03

Benchmarking code is done, generating 100 mln unique strings index,actually slightly higher as the app need to remove duplicates that may or may not be present, but in the original task I thought to go easy on you

**TheOmegaMan** · 1 November 2006, 20:13

Originally posted by AtW

Benchmarking code is done, generating 100 mln unique strings index,actually slightly higher as the app need to remove duplicates that may or may not be present, but in the original task I thought to go easy on you

Hey AtW I can do it 10% quicker than your application. How do I know ? Easy, I am a fking genious and anything I do would be much, much better.

**xoggoth** · 1 November 2006, 20:22

I used to be a wiz at estimating hardware requirements in my sales support days so I can help. Provided it's on a PDP11/44.

**AtW** · 1 November 2006, 20:27

Originally posted by xoggoth

Provided it's on a PDP11/44.

Go on then - so far I have not heard better offers and I am about to benchmark my stuff at 100 mln unique strings index, we shall see now if theoretical calculations were correct...

**AtW** · 1 November 2006, 21:25

Ok, a bit disappointing not to see any estimates, but I had pretty low expectations in the first place.

Anyway, here are the results of benchmarking (on AMD Athlon x2 3800 - single core used):

0) no caching tricks - search pattern is expected to be evenly distributed
1) 100 mln unique strings of 20 bytes each, data size: 2 GB
2) Indexing takes 30 mins
3) Generated index size is ~810 MB.
4) Running searches for 10,000 randomly selected strings, with 100 runs (total 10 mln searches) results in a sustained performance of ~232,000 (that's thousands) searches per second.

The system supports multiple indices, so it's not like everything is done for 100 mln strings: scalability is perfect because by adding 2nd CPU you will get double the speed, same goes with extra machines which will not offer extra speed but also load balancing and redundancy.

Could you do better than that? I doubt it. Not because I think I am so ****ing amazing, but because low level algorithms used in the system are such that mathematically pretty much perfect: you can't cheat them with probabilistic modelling because searches will be evenly distributed so caching tricks are off.

**MarillionFan** · 1 November 2006, 22:04

Very good AtW. Its all very good, but so far your search engine returns even less matches than www.onlyscanned5sites.com

**AtW** · 1 November 2006, 22:09

Keep faith old chap, fortune favours prepared minds.

Technical challenge

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Partners

Advertisers

Contractor Services

CUK News

Technical challenge

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Partners

Advertisers

Contractor Services

CUK News

Tag Cloud