• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

Reply to: AtW - Info needed

Collapse

You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:

  • You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
  • You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
  • If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.

Previously on "AtW - Info needed"

Collapse

  • Guest's Avatar
    Guest replied
    Re: re

    Anyway, don't blame me. That was your code - entered before I entered anything.

    Leave a comment:


  • Guest's Avatar
    Guest replied
    Re: re

    it means what it says - I rarely had to join same table, ie:

    select * from
    Table T1, Table T2
    where T1.ID=T2.ID
    and T1.Class=1
    and T2.Class=2

    I find that having to do that kind of query indicates poor architecture :hat

    Leave a comment:


  • Guest's Avatar
    Guest replied
    Re: re

    I never bothered to reference table itself in the same join before I seen it.
    What does that mean?

    Leave a comment:


  • Guest's Avatar
    Guest replied
    Re: re

    > select M1.DocID
    > from MainIndex M1,MainIndex M2
    > where M1.KeyWordID=1
    > and M2.KeyWordID=2
    > and M1.DocID=M2.DocID

    listen, I am probably in weird mood to want to supress guilt but this code belongs to PerlOfWisdom, I never bothered to reference table itself in the same join before I seen it.

    phew, that was good - not feeling guilty anymore

    Leave a comment:


  • Guest's Avatar
    Guest replied
    Re: re

    MySQL Soundex

    Leave a comment:


  • Guest's Avatar
    Guest replied
    Re: re

    reynolds, just ignore compression for now ok? At the end of the day you may find that having behind the scenes transparent disk zip compressors is the best way forward. If you want to learn more about hand-optimising your data structures then read how they did it at Google.

    Leave a comment:


  • Guest's Avatar
    Guest replied
    re

    "In real world cases you will find that any mechanism has an overhead and you need to see the redundancy of the data being greater than the overhead. Remembering of course that not all applications require 100% preservation of information.

    If you think about it - there needs to be a way to encode information within the data to specify how the size has been reduced, that is information so requires space.

    Have you read any of Shannon ?"

    No, but I've noticed he's big in compression.

    I did a test compression on a string and noticed that the string needs to be a certain size to realise any benefits of the compression (like you said the encoding information is bundled in).

    Leave a comment:


  • Guest's Avatar
    Guest replied
    Re: re

    > But if I turn words into wordID's I will still need to refer back
    > to that hashtable just to turn them back into words again

    yes thats what Table 1 is for - you will have unique index on WordID - you will only join it after you select X results you need so that you wont be joining _ALL_ found WordIDs, just those you will show on screen. Temporary table is handy for that sort of thing.


    Edit: sorry Spod I was a wee bit inconsiderate today, hope I did not hurt your feelings

    Leave a comment:


  • Guest's Avatar
    Guest replied
    re:re

    In real world cases you will find that any mechanism has an overhead and you need to see the redundancy of the data being greater than the overhead. Remembering of course that not all applications require 100% preservation of information.

    If you think about it - there needs to be a way to encode information within the data to specify how the size has been reduced, that is information so requires space.

    Have you read any of Shannon ?

    Leave a comment:


  • Guest's Avatar
    Guest replied
    Re: re

    2 bits, where only 2 of the possible 4 values are ever used.

    Leave a comment:


  • Guest's Avatar
    Guest replied
    re

    Apologies if this is a sh.it question. What is the smallest size chunk of data that can be compressed?

    Leave a comment:


  • Guest's Avatar
    Guest replied
    re

    "no need to compress data - just make sure you eliminate redundant words (50% of data easy) and turn them into numbers - this will make tables and indices very compact. "

    But if I turn words into wordID's I will still need to refer back to that hashtable just to turn them back into words again. For the space I'll save converting words to numbers is going to be used up again by providing a number-word lookup table.

    Leave a comment:


  • Guest's Avatar
    Guest replied
    ffs

    no need to compress data - just make sure you eliminate redundant words (50% of data easy) and turn them into numbers - this will make tables and indices very compact. Intersect is not necessary, use that other query which will work just fine. I suggest you download SQL Server and play around with Query Analyser - it displays very nice query plans which would show how efficient your query is - and the whole game is about doing offline enough to be able to run very tight queries in real time.

    The key to achieving exceptional real time performance is to do as much as possible offline.

    And buy that book - even though I learnt most of stuff on my own I still found that book very useful - it should be even more so since you have not gone through it all the hard way (trial and error).

    Hash tables are not the best - you need clustered index on it.

    Leave a comment:


  • Guest's Avatar
    Guest replied
    re search engine

    Cheers Atw, SupremeSpod, PerlOfWisdom.

    I'll look into those techniques. I want to fit the entire index into memory (ie. wordID, docID, locID) so response times are very quick. Unfortunately (fortunately?) I'm build my own database (a combination of hashtable and randomaccess files) so I can't use the special features such as intersect - I will just use a null pointer as a test to see if a term exists. Also I am building this into a web server so the whole package is just one application. AtW did you compress your data on the database? - I've found that this can cause I few problems.

    Leave a comment:


  • Guest's Avatar
    Guest replied
    Re: Just a suggestion...

    "I just thought that the "SoundsLike" facility would be of use."

    I've never used Soundex myself, but have seen it specified for various applications, normally lookups on people or place names.

    To see what sort of results it gives try putting your own name into

    resources.rootsweb.com/cg...xconverter

    "smith" gives this little lot, by way of example:

    SAINT | SAND | SANDY | SANTEE | SANTI | SCHMID | SCHMIDT | SCHMIT |
    SCHMITT | SHAND | SHUMATE | SINNOTT | SMITH | SMITHEY | SMOOT |
    SMOOTHY | SMYTH | SMYTHE | SNAITH | SNEAD | SNEATH | SNEED |
    SNODDY | SOUNDY | SUNDAY |

    Sunday sharing the same Soundex code as Smith ??

    Leave a comment:

Working...
X