Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!
At this point I gave up. There is no point in me living any more. Some fat moustachiod Russian immigrant in a bedsit in Birmingham has the answer to all the worlds problems and it is impossible for anyone to be better than him.
Benchmarking code is done, generating 100 mln unique strings index,actually slightly higher as the app need to remove duplicates that may or may not be present, but in the original task I thought to go easy on you
Benchmarking code is done, generating 100 mln unique strings index,actually slightly higher as the app need to remove duplicates that may or may not be present, but in the original task I thought to go easy on you
Hey AtW I can do it 10% quicker than your application. How do I know ? Easy, I am a fking genious and anything I do would be much, much better.
Go on then - so far I have not heard better offers and I am about to benchmark my stuff at 100 mln unique strings index, we shall see now if theoretical calculations were correct...
Ok, a bit disappointing not to see any estimates, but I had pretty low expectations in the first place.
Anyway, here are the results of benchmarking (on AMD Athlon x2 3800 - single core used):
0) no caching tricks - search pattern is expected to be evenly distributed
1) 100 mln unique strings of 20 bytes each, data size: 2 GB
2) Indexing takes 30 mins
3) Generated index size is ~810 MB.
4) Running searches for 10,000 randomly selected strings, with 100 runs (total 10 mln searches) results in a sustained performance of ~232,000 (that's thousands) searches per second.
The system supports multiple indices, so it's not like everything is done for 100 mln strings: scalability is perfect because by adding 2nd CPU you will get double the speed, same goes with extra machines which will not offer extra speed but also load balancing and redundancy.
Could you do better than that? I doubt it. Not because I think I am so ****ing amazing, but because low level algorithms used in the system are such that mathematically pretty much perfect: you can't cheat them with probabilistic modelling because searches will be evenly distributed so caching tricks are off.
Comment