• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

Google Style Optimized Search of a Database

Collapse
X
  •  
  • Filter
  • Time
  • Show
Clear All
new posts

    Google Style Optimized Search of a Database

    This is an optimization style question.

    I have a Dun and Bradstreet database circa 55M records. The request is that users need to look for a Company name. The company name is presently searched for using a wildcard search for example

    Where Company Like '%Smiths%'

    The problem is, this will do a row by row search and takes sometime.

    Do this in Google for example, the return is blurringingly fast.

    How can I optimize/write something to return records from a database using a wildcard, but at the same speed as something like a Google Search.

    Need an optimization guru here? Atw?
    What happens in General, stays in General.
    You know what they say about assumptions!

    #2
    I'm not sure this is a job for a Crystal Reports / MS Access / Shopkeeper type person MF.



    What database is the data currently living on? Is it MS SQL Server 2005 (if not, it should be)?

    If so, start looking here http://msdn.microsoft.com/en-us/library/ms345119.aspx

    Or pay AtW 10 Roubles a day to develop you a SKA.

    Comment


      #3
      You need to set up an indexing table with values like:

      John Smith
      ohn Smith
      hn Smith
      n Smith
      Smith
      Smith
      mith
      ith
      th

      This allows indexed searches as there is no wildcard at the begining of the string.

      Comment


        #4
        Originally posted by PerlOfWisdom View Post
        You need to set up an indexing table with values like:

        John Smith
        ohn Smith
        hn Smith
        n Smith
        Smith
        Smith
        mith
        ith
        th

        This allows indexed searches as there is no wildcard at the begining of the string.

        Wow, that's a great idea! I wonder how big the database will be if you do that for every word in every column across 55,000,000 rows?

        Comment


          #5
          Originally posted by MarillionFan View Post
          This is an optimization style question.

          I have a Dun and Bradstreet database circa 55M records. The request is that users need to look for a Company name. The company name is presently searched for using a wildcard search for example

          Where Company Like '%Smiths%'
          What is the front-end used to run the search? Oracle?

          Comment


            #6
            Shame, DimPrawn is poo pooing that idea above, shows he only has limited experience

            Yes it's an Oracle 10 Database.

            There appears to be some Oracle functionality that does seem to create an index of all combinations(as suggested above) based around 'Oracle Text'. From reading the article, it can also be designed to use a fuzzy logic match.

            The only problem appears to be if the index is greater than the actual original column, but then again an index using an equal would be quicker than a wildcard anyway

            Is this a good method?




            http://209.85.173.104/search?q=cache...lnk&cd=1&gl=uk
            What happens in General, stays in General.
            You know what they say about assumptions!

            Comment


              #7
              Originally posted by MarillionFan View Post
              Shame, DimPrawn is poo pooing that idea above, shows he only has limited experience

              Yes it's an Oracle 10 Database.

              There appears to be some Oracle functionality that does seem to create an index of all combinations(as suggested above) based around 'Oracle Text'. From reading the article, it can also be designed to use a fuzzy logic match.

              The only problem appears to be if the index is greater than the actual original column, but then again an index using an equal would be quicker than a wildcard anyway

              Is this a good method?




              http://209.85.173.104/search?q=cache...lnk&cd=1&gl=uk
              No, you carry on mate. Be interesting to see you create your own "google" style index on 55M rows rather than use a highly optimised and sophisticated tool designed for the job such as Full-Text Index on SQL Server.

              Comment


                #8
                Originally posted by MarillionFan View Post
                Shame, DimPrawn is poo pooing that idea above, shows he only has limited experience

                Yes it's an Oracle 10 Database.

                There appears to be some Oracle functionality that does seem to create an index of all combinations(as suggested above) based around 'Oracle Text'. From reading the article, it can also be designed to use a fuzzy logic match.

                The only problem appears to be if the index is greater than the actual original column, but then again an index using an equal would be quicker than a wildcard anyway

                Is this a good method?




                http://209.85.173.104/search?q=cache...lnk&cd=1&gl=uk
                I have no idea as I have next to zero technical capability. Reason I asked the question regarding the front end is I know there is standard functionality in oracle apps that does the exact searches you mentioned i.e. retrieves customers via wildcard, indexed and fuzzy searches. As there are also API's, pre-built interface to D&B I figured you may be able to look at how oracle have already done it as a starter for 10.

                Comment


                  #9
                  Originally posted by MarillionFan View Post
                  This is an optimization style question.

                  I have a Dun and Bradstreet database circa 55M records. The request is that users need to look for a Company name. The company name is presently searched for using a wildcard search for example

                  Where Company Like '%Smiths%'

                  The problem is, this will do a row by row search and takes sometime.

                  Do this in Google for example, the return is blurringingly fast.

                  How can I optimize/write something to return records from a database using a wildcard, but at the same speed as something like a Google Search.

                  Need an optimization guru here? Atw?
                  Optimisation - studied and wrote about this during my studies in operational research. Studied the method employed by google, page ranking, could provide paper to you if interested.
                  "Never argue with stupid people, they will drag you down to their level and beat you with experience". Mark Twain

                  Comment


                    #10
                    Originally posted by DimPrawn View Post
                    Wow, that's a great idea! I wonder how big the database will be if you do that for every word in every column across 55,000,000 rows?

                    Probably bigger than any database you've seen.

                    Comment

                    Working...
                    X