Deleting records SQL

**OrangeHopper** · 20 June 2003, 15:58

Me no know proper answer but wouldn't "not in (select unique(tblThat.clientID) from tblThat)" help?

**PerlOfWisdom** · 20 June 2003, 16:37

Put an index on tblThat.clientID,

or use a temporary table:

create table tblTemp as select distinct clientID from tblThat;

delete from tblThis where tblThis.clientID not in (select tblTemp.clientID from tblTemp);

**OrangeHopper** · 20 June 2003, 17:11

Doh ... distinct.

**Mr Grain** · 20 June 2003, 17:13

cheers Perl, that'll do

**AtW** · 20 June 2003, 20:52

big "not in" subqueries can be notoriously slow on some well known databases - I had this problem 4 years ago on Sybase. The answers above do not solve principal problem - eliminate need for NOT IN subquery - indexing helps only so much. A much faster code would be something like this (using Transact SQL here):

select ClientID,DeleteFlag=1 into #temp from tblThat

update #temp set DeleteFlag=0
from #temp T,tblThis T2
where T2.ClientID=T.ClientID

delete tblThis from
#temp T,tblThis T2
where T2.ClientID=T.ClientID
and DeleteFlag=1

drop table #temp

Now, this way we avoided "not in" logic completely and only use normal simple joins that every good DB does well.

Notes:

0. There should be good indecies on tblThis.ClientID and tblThat.ClientID
1. I did not use distincts on purpose, this will give big performance benefits if overall "uniqueness" of tableThat is quiet high (but does not have to be 100%). You can even be smart to check built in DB stats to make decision if that particular query would run better off with distinct. Otherwise first query would select distinct CliendIDs from tblThat

I must add that DB2 does appear to do the same thing behind scenes - something every database should do in similar situations.

PerlOfWisdom: stick to yer hashes in Perl -- you will laugh but I used this approach to get some slow SQL queries to run well fast until I mastered art of SQL opts

**PerlOfWisdom** · 23 June 2003, 12:10

Atw:
You don't know how much speeding up is required. Your code doesn't work on some DBMSs (eg. Oracle) since "update-from" and "delete-from" are not standard SQL.

If speeding up from 5 mins to 60 secs is sufficient, then it's better to have easily readable SQL than spend hours optimising it into some unreadable, non-standard mess.

I think that the index on tblThat.clientID would probably do the job, and would remove the need for untidy temporary tables.

Stick to card index systems until you've really mastered SQL opts.

MySQLOfWisdom

**AtW** · 23 June 2003, 20:19

ok two points here

1. speed: improvement is huge. I had "not in" code 4 years ago and it was running about 2 hours, with my optimisation it was cut down to 5 mins - big difference if you ask me.

2. SQL compatibility: indeed Oracle does not have "update from" (Sybase and MS SQL do) however it can do correlated sub-query with exactly the same results - I know I tried both not in and my method on Sybase, Oracle and DB2 - the latter was actually quiet good at dealing with pure "not in" queries. I could not be arsed to write code for 2 diff systems of which i prefer Transact SQL.

> I think that the index on tblThat.clientID would probably do
> the job, and would remove the need for untidy temporary
> tables.

well you THINK, but I did actually have exactly the same problem with exactly same SQL on a real production system with a fair few records (measured only in 100ks rather than millions, yet that was enough to make query run too long) a few years ago when I was SQL noob - but not anymore. Index is good but the main problem is NOT IN something you do not understand. Also, your "distinct" clause would fk up performance quiet nicely - sorry but your code just can't pretend to be optimised SQL.

> Stick to card index systems until you've really mastered
> SQL opts.

oh but I think I did get better performing SQL here, it was you who suggest that there is no need for optimisation for the sake of readability and this comes from someone who programs in Perl :lol FYI I am big perl lover.

I tell you what - lets make it fair: lets pick a good database and analyse both of our SQL running with different number of rows to observe performance and then decide objectively whose SQL better?

AtW - certified Sybase ASE Perf & Opts

Edit: btw last time I was playing with punch cards was in late 70s when my parents (programmers at that time) brought home some of their programs on punch cards - bet you have not even seen those :lol

**AtW** · 24 June 2003, 23:21

come on PerlOfWisdom - accept my challenge 8)

**PerlOfWisdom** · 25 June 2003, 14:25

I never said that my SQL would be faster than your non-standard messy code. I said that it would probably be fast enough. It's only 20,000 records in each file and often it's not worth the effort of super optimisation.

I recently reduced the running time of a 4 hour query to 20 minutes. The query is run daily, and the improvement saves 1 hour of on-call time per day (it shifts the start time of the whole process into the working day).

I could have spent weeks improving it to 5 minutes, but it wouldn't give any further benefit to an 8 hour process.

Deleting records SQL