Contractor UK Bulletin Board  PayStream

Go Back   Contractor UK Bulletin Board > Contractor UK Forums > Technical
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

Reply
 
Thread Tools Display Modes
Old 22nd July 2008, 12:15   #1
MarillionFan
Super poster
 
MarillionFan's Avatar
 
Join Date: Jul 2005
Posts: 2,649
Default Google Style Optimized Search of a Database

This is an optimization style question.

I have a Dun and Bradstreet database circa 55M records. The request is that users need to look for a Company name. The company name is presently searched for using a wildcard search for example

Where Company Like '%Smiths%'

The problem is, this will do a row by row search and takes sometime.

Do this in Google for example, the return is blurringingly fast.

How can I optimize/write something to return records from a database using a wildcard, but at the same speed as something like a Google Search.

Need an optimization guru here? Atw?
MarillionFan is offline   Reply With Quote
Old 22nd July 2008, 12:30   #2
DimPrawn
Godlike
 
DimPrawn's Avatar
 
Join Date: Jul 2005
Posts: 10,425
Default

I'm not sure this is a job for a Crystal Reports / MS Access / Shopkeeper type person MF.



What database is the data currently living on? Is it MS SQL Server 2005 (if not, it should be)?

If so, start looking here http://msdn.microsoft.com/en-us/library/ms345119.aspx

Or pay AtW 10 Roubles a day to develop you a SKA.
__________________
Last week I cashed a cheque and the bank bounced.
DimPrawn is offline   Reply With Quote
Old 22nd July 2008, 12:34   #3
PerlOfWisdom
More time posting than coding
 
Join Date: Jul 2005
Posts: 489
Default

You need to set up an indexing table with values like:

John Smith
ohn Smith
hn Smith
n Smith
Smith
Smith
mith
ith
th

This allows indexed searches as there is no wildcard at the begining of the string.
PerlOfWisdom is offline   Reply With Quote
Old 22nd July 2008, 12:38   #4
DimPrawn
Godlike
 
DimPrawn's Avatar
 
Join Date: Jul 2005
Posts: 10,425
Default

Quote:
Originally Posted by PerlOfWisdom View Post
You need to set up an indexing table with values like:

John Smith
ohn Smith
hn Smith
n Smith
Smith
Smith
mith
ith
th

This allows indexed searches as there is no wildcard at the begining of the string.

Wow, that's a great idea! I wonder how big the database will be if you do that for every word in every column across 55,000,000 rows?

__________________
Last week I cashed a cheque and the bank bounced.
DimPrawn is offline   Reply With Quote
Old 22nd July 2008, 12:44   #5
oracleslave
Godlike
 
oracleslave's Avatar
 
Join Date: Aug 2007
Location: London
Posts: 5,480
Default

Quote:
Originally Posted by MarillionFan View Post
This is an optimization style question.

I have a Dun and Bradstreet database circa 55M records. The request is that users need to look for a Company name. The company name is presently searched for using a wildcard search for example

Where Company Like '%Smiths%'
What is the front-end used to run the search? Oracle?
oracleslave is offline   Reply With Quote
Old 22nd July 2008, 12:48   #6
MarillionFan
Super poster
 
MarillionFan's Avatar
 
Join Date: Jul 2005
Posts: 2,649
Default

Shame, DimPrawn is poo pooing that idea above, shows he only has limited experience

Yes it's an Oracle 10 Database.

There appears to be some Oracle functionality that does seem to create an index of all combinations(as suggested above) based around 'Oracle Text'. From reading the article, it can also be designed to use a fuzzy logic match.

The only problem appears to be if the index is greater than the actual original column, but then again an index using an equal would be quicker than a wildcard anyway

Is this a good method?




http://209.85.173.104/search?q=cache...lnk&cd=1&gl=uk
MarillionFan is offline   Reply With Quote
Old 22nd July 2008, 12:53   #7
DimPrawn
Godlike
 
DimPrawn's Avatar
 
Join Date: Jul 2005
Posts: 10,425
Default

Quote:
Originally Posted by MarillionFan View Post
Shame, DimPrawn is poo pooing that idea above, shows he only has limited experience

Yes it's an Oracle 10 Database.

There appears to be some Oracle functionality that does seem to create an index of all combinations(as suggested above) based around 'Oracle Text'. From reading the article, it can also be designed to use a fuzzy logic match.

The only problem appears to be if the index is greater than the actual original column, but then again an index using an equal would be quicker than a wildcard anyway

Is this a good method?




http://209.85.173.104/search?q=cache...lnk&cd=1&gl=uk
No, you carry on mate. Be interesting to see you create your own "google" style index on 55M rows rather than use a highly optimised and sophisticated tool designed for the job such as Full-Text Index on SQL Server.
__________________
Last week I cashed a cheque and the bank bounced.
DimPrawn is offline   Reply With Quote
Old 22nd July 2008, 12:56   #8
oracleslave
Godlike
 
oracleslave's Avatar
 
Join Date: Aug 2007
Location: London
Posts: 5,480
Default

Quote:
Originally Posted by MarillionFan View Post
Shame, DimPrawn is poo pooing that idea above, shows he only has limited experience

Yes it's an Oracle 10 Database.

There appears to be some Oracle functionality that does seem to create an index of all combinations(as suggested above) based around 'Oracle Text'. From reading the article, it can also be designed to use a fuzzy logic match.

The only problem appears to be if the index is greater than the actual original column, but then again an index using an equal would be quicker than a wildcard anyway

Is this a good method?




http://209.85.173.104/search?q=cache...lnk&cd=1&gl=uk
I have no idea as I have next to zero technical capability. Reason I asked the question regarding the front end is I know there is standard functionality in oracle apps that does the exact searches you mentioned i.e. retrieves customers via wildcard, indexed and fuzzy searches. As there are also API's, pre-built interface to D&B I figured you may be able to look at how oracle have already done it as a starter for 10.
oracleslave is offline   Reply With Quote
Old 22nd July 2008, 13:09   #9
scooterscot
Contractor Among Contractors
 
scooterscot's Avatar
 
Join Date: Nov 2005
Location: Edinbra
Posts: 1,852
Default

Quote:
Originally Posted by MarillionFan View Post
This is an optimization style question.

I have a Dun and Bradstreet database circa 55M records. The request is that users need to look for a Company name. The company name is presently searched for using a wildcard search for example

Where Company Like '%Smiths%'

The problem is, this will do a row by row search and takes sometime.

Do this in Google for example, the return is blurringingly fast.

How can I optimize/write something to return records from a database using a wildcard, but at the same speed as something like a Google Search.

Need an optimization guru here? Atw?
Optimisation - studied and wrote about this during my studies in operational research. Studied the method employed by google, page ranking, could provide paper to you if interested.
scooterscot is offline   Reply With Quote
Old 22nd July 2008, 13:09   #10
PerlOfWisdom
More time posting than coding
 
Join Date: Jul 2005
Posts: 489
Default

Quote:
Originally Posted by DimPrawn View Post
Wow, that's a great idea! I wonder how big the database will be if you do that for every word in every column across 55,000,000 rows?

Probably bigger than any database you've seen.
PerlOfWisdom is offline   Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump


All times are GMT. The time now is 22:28.


Advertisers
PayStream

CUK Navigation

Contractor Alliance
Formed a new Ltd Co?

20% off business insurance
£10 off Bauer & Cottrell contract reviews
Find co-workers & client introductions

Increase your value to clients here

Fast Company Formation
Same day online company formation £75 + VAT

Form your Ltd Co Here

Contractor Services


 
Content Relevant URLs by vBSEO 2.4.0 © 2005, Crawlability, Inc.