• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:

  • You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
  • You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
  • If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.

Previously on "Google Style Optimized Search of a Database"

Collapse

  • xchaotic
    replied
    In this particular example, I dare say that the db is not flat.
    Which means you could arrange fields to search for a location first and have an index of those.

    Also within those 55m there will be repetitions (companies with the same or a similar name), so you only need to hint at viable possibilities.
    Notice how google only does that after you provide the first word, as that limits the number of possibilities.

    Leave a comment:


  • fatcat51
    replied
    +1 for Lucene, just integrated it into a .net project on sql server 2005. It's blazingly fast and really flexible.

    I was shocked how quick and easy it was to get search up and running, plus gives you loads of extra features like highlighting etc.

    Leave a comment:


  • NickFitz
    replied
    Originally posted by DimPrawn View Post
    Is that what they use for the search on microsoft.com? Because that sucks...

    Leave a comment:


  • plinthadenoid
    replied
    Will SSE 2008 crawl an Oracle database?

    Leave a comment:


  • DimPrawn
    replied
    http://www.microsoft.com/enterprises...s/default.aspx

    Free as in free beer.

    Leave a comment:


  • plinthadenoid
    replied
    I've been working with someone who took over a project that has so far taken two years of a data analyst time working up a taxonomy of the data to be searched to allow quick and easy searching on-line.

    Chap I work with has just recommended the client stops trying to organise the data which they have little control over and instead plump for one of these:[URL="http://www.google.com/enterprise/gsa/"]

    Costs don't seem to be too bad, small box is £2k, enterprise approx £18k.

    Would it be relevant in your case?

    Plinth

    Leave a comment:


  • Bob Dalek
    replied
    Try Lucene. It's now Oracle-savvy, so give it a go.

    Leave a comment:


  • DimPrawn
    replied
    Originally posted by MarillionFan View Post
    If I want to know about Access or Excel, I'll definitely ask you Dim otherwise I'll wait for an answer from a proper contractor.
    Atw is on his way over to kick your butt.

    Leave a comment:


  • scooterscot
    replied
    PS: Information systems is what you're wanting to look into, optimisation concerns minimising resource to generate a desired outcome.

    Leave a comment:


  • MarillionFan
    replied
    If I want to know about Access or Excel, I'll definitely ask you Dim otherwise I'll wait for an answer from a proper contractor.

    Leave a comment:


  • PerlOfWisdom
    replied
    Originally posted by DimPrawn View Post
    Wow, that's a great idea! I wonder how big the database will be if you do that for every word in every column across 55,000,000 rows?

    Probably bigger than any database you've seen.

    Leave a comment:


  • scooterscot
    replied
    Originally posted by MarillionFan View Post
    This is an optimization style question.

    I have a Dun and Bradstreet database circa 55M records. The request is that users need to look for a Company name. The company name is presently searched for using a wildcard search for example

    Where Company Like '%Smiths%'

    The problem is, this will do a row by row search and takes sometime.

    Do this in Google for example, the return is blurringingly fast.

    How can I optimize/write something to return records from a database using a wildcard, but at the same speed as something like a Google Search.

    Need an optimization guru here? Atw?
    Optimisation - studied and wrote about this during my studies in operational research. Studied the method employed by google, page ranking, could provide paper to you if interested.

    Leave a comment:


  • oracleslave
    replied
    Originally posted by MarillionFan View Post
    Shame, DimPrawn is poo pooing that idea above, shows he only has limited experience

    Yes it's an Oracle 10 Database.

    There appears to be some Oracle functionality that does seem to create an index of all combinations(as suggested above) based around 'Oracle Text'. From reading the article, it can also be designed to use a fuzzy logic match.

    The only problem appears to be if the index is greater than the actual original column, but then again an index using an equal would be quicker than a wildcard anyway

    Is this a good method?




    http://209.85.173.104/search?q=cache...lnk&cd=1&gl=uk
    I have no idea as I have next to zero technical capability. Reason I asked the question regarding the front end is I know there is standard functionality in oracle apps that does the exact searches you mentioned i.e. retrieves customers via wildcard, indexed and fuzzy searches. As there are also API's, pre-built interface to D&B I figured you may be able to look at how oracle have already done it as a starter for 10.

    Leave a comment:


  • DimPrawn
    replied
    Originally posted by MarillionFan View Post
    Shame, DimPrawn is poo pooing that idea above, shows he only has limited experience

    Yes it's an Oracle 10 Database.

    There appears to be some Oracle functionality that does seem to create an index of all combinations(as suggested above) based around 'Oracle Text'. From reading the article, it can also be designed to use a fuzzy logic match.

    The only problem appears to be if the index is greater than the actual original column, but then again an index using an equal would be quicker than a wildcard anyway

    Is this a good method?




    http://209.85.173.104/search?q=cache...lnk&cd=1&gl=uk
    No, you carry on mate. Be interesting to see you create your own "google" style index on 55M rows rather than use a highly optimised and sophisticated tool designed for the job such as Full-Text Index on SQL Server.

    Leave a comment:


  • MarillionFan
    replied
    Shame, DimPrawn is poo pooing that idea above, shows he only has limited experience

    Yes it's an Oracle 10 Database.

    There appears to be some Oracle functionality that does seem to create an index of all combinations(as suggested above) based around 'Oracle Text'. From reading the article, it can also be designed to use a fuzzy logic match.

    The only problem appears to be if the index is greater than the actual original column, but then again an index using an equal would be quicker than a wildcard anyway

    Is this a good method?




    http://209.85.173.104/search?q=cache...lnk&cd=1&gl=uk

    Leave a comment:

Working...
X