Originally posted by Spacecadet
View Post
- Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
- Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!
Journey to SQL Authority
Collapse
X
-
You had a lookup table that is 30 million rows? Sounds like a design issue. Couldn't you use a join instead?"A life, Jimmy, you know what that is? It’s the s*** that happens while you’re waiting for moments that never come." -- Lester Freamon -
Sounds like a UK address table.Originally posted by Freamon View PostYou had a lookup table that is 30 million rows? Sounds like a design issue. Couldn't you use a join instead?Comment
-
it's not and no... although i might give the join another go tonight, just thought of another way of doing it.Originally posted by Freamon View PostYou had a lookup table that is 30 million rows? Sounds like a design issue. Couldn't you use a join instead?
Thanks
bingoOriginally posted by k2p2 View PostSounds like a UK address table.
problem with address data (at
least in this project) is that the incoming format is crap and doesn't join nicely to the PAF dataCoffee's for closersComment
-
I had this in a previous life. I built a neat pattern-matching tool to find likely dupesOriginally posted by Spacecadet View Postproblem with address data (at
least in this project) is that the incoming format is crap and doesn't join nicely to the PAF data
ah, halcyon days
(\__/)
(>'.'<)
("")("") Born to Drink. Forced to WorkComment
-
You'll have fun with that. Last time I did that I ended with a multi-stage process based on the accuracy of the individual records.Originally posted by Spacecadet View Post
bingo
problem with address data (at
least in this project) is that the incoming format is crap and doesn't join nicely to the PAF data
10% of it turned out to be total and utter crap which was surprising as they were all customers who had purchased goods via mail order.merely at clientco for the entertainmentComment
-
fuzzy logic. thats the answer, fuzzy logicOriginally posted by eek View PostYou'll have fun with that. Last time I did that I ended with a multi-stage process based on the accuracy of the individual records.
10% of it turned out to be total and utter crap which was surprising as they were all customers who had purchased goods via mail order.
(\__/)
(>'.'<)
("")("") Born to Drink. Forced to WorkComment
-
Fuzzy Logic doesn't cover the example of postcode given for Main Street, address given as High Row (which has a different postcode).Originally posted by EternalOptimist View Postfuzzy logic. thats the answer, fuzzy logic
merely at clientco for the entertainmentComment
-
yes it doesOriginally posted by eek View PostFuzzy Logic doesn't cover the example of postcode given for Main Street, address given as High Row (which has a different postcode).(\__/)
(>'.'<)
("")("") Born to Drink. Forced to WorkComment
-
That's pretty much the conclusion I'd got to as well, filter out the good, the not so good, the passable and then ignore the rest!Originally posted by eek View PostYou'll have fun with that. Last time I did that I ended with a multi-stage process based on the accuracy of the individual records.
10% of it turned out to be total and utter crap which was surprising as they were all customers who had purchased goods via mail order.Coffee's for closersComment
-
A lookup on a 30 million row table is definitely not a good idea. Lookup is designed to read the whole table into memory when the job starts up, and then access the data in memory for each row that passes through. If you read 30 million rows into memory your job will swap to disk a lot. I'd suggest using a join is definitely the way to go. In a real ETL tool you could also partition your data by the first half of the postcode to improve the join performance.Originally posted by Spacecadet View Postit's not and no... although i might give the join another go tonight, just thought of another way of doing it.
Thanks

Yep, I think most people have been through this pain at some point. Especially when it's addresses that have been manually keyed in with no sense-checking at the point of entry. There's specialist data cleansing tools for address data which may save you a lot of pain. Dataflux does a good job of it (amongst other things).Originally posted by Spacecadet View Postbingo
problem with address data (at
least in this project) is that the incoming format is crap and doesn't join nicely to the PAF data"A life, Jimmy, you know what that is? It’s the s*** that happens while you’re waiting for moments that never come." -- Lester FreamonComment
- Home
- News & Features
- First Timers
- IR35 / S660 / BN66
- Employee Benefit Trusts
- Agency Workers Regulations
- MSC Legislation
- Limited Companies
- Dividends
- Umbrella Company
- VAT / Flat Rate VAT
- Job News & Guides
- Money News & Guides
- Guide to Contracts
- Successful Contracting
- Contracting Overseas
- Contractor Calculators
- MVL
- Contractor Expenses
Advertisers


Comment