In this particular example, I dare say that the db is not flat.
Which means you could arrange fields to search for a location first and have an index of those.
Also within those 55m there will be repetitions (companies with the same or a similar name), so you only need to hint at viable possibilities.
Notice how google only does that after you provide the first word, as that limits the number of possibilities.
- Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
- Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!
Collapse
You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:
- You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
- You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
- If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.
Logging in...
Previously on "Google Style Optimized Search of a Database"
Collapse
-
+1 for Lucene, just integrated it into a .net project on sql server 2005. It's blazingly fast and really flexible.
I was shocked how quick and easy it was to get search up and running, plus gives you loads of extra features like highlighting etc.
Leave a comment:
-
Is that what they use for the search on microsoft.com? Because that sucks...Originally posted by DimPrawn View Post
Leave a comment:
-
-
I've been working with someone who took over a project that has so far taken two years of a data analyst time working up a taxonomy of the data to be searched to allow quick and easy searching on-line.
Chap I work with has just recommended the client stops trying to organise the data which they have little control over and instead plump for one of these:[URL="http://www.google.com/enterprise/gsa/"]
Costs don't seem to be too bad, small box is £2k, enterprise approx £18k.
Would it be relevant in your case?
Plinth
Leave a comment:
-
PS: Information systems is what you're wanting to look into, optimisation concerns minimising resource to generate a desired outcome.
Leave a comment:
-
If I want to know about Access or Excel, I'll definitely ask you Dim otherwise I'll wait for an answer from a proper contractor.
Leave a comment:
-
Probably bigger than any database you've seen.Originally posted by DimPrawn View PostWow, that's a great idea! I wonder how big the database will be if you do that for every word in every column across 55,000,000 rows?

Leave a comment:
-
Optimisation - studied and wrote about this during my studies in operational research. Studied the method employed by google, page ranking, could provide paper to you if interested.Originally posted by MarillionFan View PostThis is an optimization style question.
I have a Dun and Bradstreet database circa 55M records. The request is that users need to look for a Company name. The company name is presently searched for using a wildcard search for example
Where Company Like '%Smiths%'
The problem is, this will do a row by row search and takes sometime.
Do this in Google for example, the return is blurringingly fast.
How can I optimize/write something to return records from a database using a wildcard, but at the same speed as something like a Google Search.
Need an optimization guru here? Atw?
Leave a comment:
-
I have no idea as I have next to zero technical capability. Reason I asked the question regarding the front end is I know there is standard functionality in oracle apps that does the exact searches you mentioned i.e. retrieves customers via wildcard, indexed and fuzzy searches. As there are also API's, pre-built interface to D&B I figured you may be able to look at how oracle have already done it as a starter for 10.Originally posted by MarillionFan View PostShame, DimPrawn is poo pooing that idea above, shows he only has limited experience
Yes it's an Oracle 10 Database.
There appears to be some Oracle functionality that does seem to create an index of all combinations(as suggested above) based around 'Oracle Text'. From reading the article, it can also be designed to use a fuzzy logic match.
The only problem appears to be if the index is greater than the actual original column, but then again an index using an equal would be quicker than a wildcard anyway
Is this a good method?
http://209.85.173.104/search?q=cache...lnk&cd=1&gl=uk
Leave a comment:
-
No, you carry on mate. Be interesting to see you create your own "google" style index on 55M rows rather than use a highly optimised and sophisticated tool designed for the job such as Full-Text Index on SQL Server.Originally posted by MarillionFan View PostShame, DimPrawn is poo pooing that idea above, shows he only has limited experience
Yes it's an Oracle 10 Database.
There appears to be some Oracle functionality that does seem to create an index of all combinations(as suggested above) based around 'Oracle Text'. From reading the article, it can also be designed to use a fuzzy logic match.
The only problem appears to be if the index is greater than the actual original column, but then again an index using an equal would be quicker than a wildcard anyway
Is this a good method?
http://209.85.173.104/search?q=cache...lnk&cd=1&gl=uk
Leave a comment:
-
Shame, DimPrawn is poo pooing that idea above, shows he only has limited experience
Yes it's an Oracle 10 Database.
There appears to be some Oracle functionality that does seem to create an index of all combinations(as suggested above) based around 'Oracle Text'. From reading the article, it can also be designed to use a fuzzy logic match.
The only problem appears to be if the index is greater than the actual original column, but then again an index using an equal would be quicker than a wildcard anyway
Is this a good method?
http://209.85.173.104/search?q=cache...lnk&cd=1&gl=uk
Leave a comment:
- Home
- News & Features
- First Timers
- IR35 / S660 / BN66
- Employee Benefit Trusts
- Agency Workers Regulations
- MSC Legislation
- Limited Companies
- Dividends
- Umbrella Company
- VAT / Flat Rate VAT
- Job News & Guides
- Money News & Guides
- Guide to Contracts
- Successful Contracting
- Contracting Overseas
- Contractor Calculators
- MVL
- Contractor Expenses
Advertisers

Leave a comment: