• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

Any _real_ experts on compression here?

Collapse
X
  •  
  • Filter
  • Time
  • Show
Clear All
new posts

    #11
    Originally posted by Francko
    "I am interested in hearing from those who know what "MOFFAT" (in context of compression) stands for."
    Francko - the key words is "know": knowledge of what it means is not the same as parroting copy from Google's search, this is no more "knowing" than parrot who says words he remembered.

    Comment


      #12
      Well, the way it works is that the domain is restricted, so if you can restrict the domain further then you can achieve better compression.

      So if you don't index short words ...
      Insanity: repeating the same actions, but expecting different results.
      threadeds website, and here's my blog.

      Comment


        #13
        "I am interested in hearing from those who know what "MOFFAT" (in context of compression) stands for"

        This literally only means that you want to hear from somebody who knows what MOFFAT "stands for". Now I know. No, I don't know anything about it. Was just trying to be helpful, we can all make silly mistakes like failing a simple search. Obviously, not the russian God of all times. Sorry for myself about trying to help you (and besides, it took me a few minutes to find his personal home page, minutes that I should have used looking for porn rather than helping ungrateful pompous nerds). Pesciol ti.
        Last edited by Francko; 2 December 2005, 19:43.
        I've seen much of the rest of the world. It is brutal and cruel and dark, Rome is the light.

        Comment


          #14
          You were not helping Franko.

          threaded: there is no benefit in not indexing short words: they take as much space in binary index as long words. The issue is how to compress binary index best - provided decompression speed is fast of course.

          Comment


            #15
            Noooooo, if you are indexing 65535 words your index will be smaller and quicker to traverse than if you are indexing 66000 words ...
            Insanity: repeating the same actions, but expecting different results.
            threadeds website, and here's my blog.

            Comment


              #16
              Originally posted by threaded
              Noooooo, if you are indexing 65535 words your index will be smaller and quicker to traverse than if you are indexing 66000 words ...
              I can't just throw out short words - only stop words like "of"/"to"/etc get this type of treatment, and suprisingly they do not account for THAT much space.

              This does not answer the main question - how to compress best the index.

              Comment


                #17
                I am not sure how your SKA works, but on a guess have a look at his paper: "An efficient indexing technique for full-text database systems"
                Insanity: repeating the same actions, but expecting different results.
                threadeds website, and here's my blog.

                Comment


                  #18
                  Originally posted by threaded
                  "An efficient indexing technique for full-text database systems"
                  My system evolved beyond that paper - compression methods that they use are not applicable, but recent Moffat's work on binary alligned integer coding is perfect -- I am using it to implement compression, however I wondered if there is anything better exists: all those guys tend to ignore modern search engine requirements of storing additional information about a "hit" - not just document in which it occured, but also position/type of the hit. Consequently their compression techniques do not take this into account, even though these hits actually account for majority of index size.

                  Comment


                    #19
                    "binary alligned integer coding" - sounds vaguely interesting. care to expound?

                    Comment


                      #20
                      AtW, what percentage of revenue do we get for this consultancy?
                      Vieze Oude Man

                      Comment

                      Working...
                      X