• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

Deleting records SQL

Collapse
X
  •  
  • Filter
  • Time
  • Show
Clear All
new posts

    Deleting records SQL

    Anyone tell me how to do a delete the same as below

    delete from tblThis where tblThis.clientID not in (select tblThat.clientID from tblThat)

    perhaps using a left join or something?? thing is, the above SQL takes 5mins to run on 20,000 records in tblThis and tblThat.....

    ???

    #2
    Me no know proper answer but wouldn't "not in (select unique(tblThat.clientID) from tblThat)" help?

    Comment


      #3
      Put an index on tblThat.clientID,


      or use a temporary table:

      create table tblTemp as select distinct clientID from tblThat;

      delete from tblThis where tblThis.clientID not in (select tblTemp.clientID from tblTemp);

      Comment


        #4
        Doh ... distinct.

        Comment


          #5
          cheers Perl, that'll do

          Comment


            #6
            big "not in" subqueries can be notoriously slow on some well known databases - I had this problem 4 years ago on Sybase. The answers above do not solve principal problem - eliminate need for NOT IN subquery - indexing helps only so much. A much faster code would be something like this (using Transact SQL here):

            select ClientID,DeleteFlag=1 into #temp from tblThat

            update #temp set DeleteFlag=0
            from #temp T,tblThis T2
            where T2.ClientID=T.ClientID

            delete tblThis from
            #temp T,tblThis T2
            where T2.ClientID=T.ClientID
            and DeleteFlag=1

            drop table #temp

            Now, this way we avoided "not in" logic completely and only use normal simple joins that every good DB does well.

            Notes:

            0. There should be good indecies on tblThis.ClientID and tblThat.ClientID
            1. I did not use distincts on purpose, this will give big performance benefits if overall "uniqueness" of tableThat is quiet high (but does not have to be 100%). You can even be smart to check built in DB stats to make decision if that particular query would run better off with distinct. Otherwise first query would select distinct CliendIDs from tblThat

            I must add that DB2 does appear to do the same thing behind scenes - something every database should do in similar situations.


            PerlOfWisdom: stick to yer hashes in Perl -- you will laugh but I used this approach to get some slow SQL queries to run well fast until I mastered art of SQL opts

            Comment


              #7
              Atw:
              You don't know how much speeding up is required. Your code doesn't work on some DBMSs (eg. Oracle) since "update-from" and "delete-from" are not standard SQL.

              If speeding up from 5 mins to 60 secs is sufficient, then it's better to have easily readable SQL than spend hours optimising it into some unreadable, non-standard mess.

              I think that the index on tblThat.clientID would probably do the job, and would remove the need for untidy temporary tables.


              Stick to card index systems until you've really mastered SQL opts.


              MySQLOfWisdom

              Comment


                #8
                ok two points here

                1. speed: improvement is huge. I had "not in" code 4 years ago and it was running about 2 hours, with my optimisation it was cut down to 5 mins - big difference if you ask me.

                2. SQL compatibility: indeed Oracle does not have "update from" (Sybase and MS SQL do) however it can do correlated sub-query with exactly the same results - I know I tried both not in and my method on Sybase, Oracle and DB2 - the latter was actually quiet good at dealing with pure "not in" queries. I could not be arsed to write code for 2 diff systems of which i prefer Transact SQL.

                > I think that the index on tblThat.clientID would probably do
                > the job, and would remove the need for untidy temporary
                > tables.

                well you THINK, but I did actually have exactly the same problem with exactly same SQL on a real production system with a fair few records (measured only in 100ks rather than millions, yet that was enough to make query run too long) a few years ago when I was SQL noob - but not anymore. Index is good but the main problem is NOT IN something you do not understand. Also, your "distinct" clause would fk up performance quiet nicely - sorry but your code just can't pretend to be optimised SQL.

                > Stick to card index systems until you've really mastered
                > SQL opts.

                oh but I think I did get better performing SQL here, it was you who suggest that there is no need for optimisation for the sake of readability and this comes from someone who programs in Perl :lol FYI I am big perl lover.

                I tell you what - lets make it fair: lets pick a good database and analyse both of our SQL running with different number of rows to observe performance and then decide objectively whose SQL better?


                AtW - certified Sybase ASE Perf & Opts


                Edit: btw last time I was playing with punch cards was in late 70s when my parents (programmers at that time) brought home some of their programs on punch cards - bet you have not even seen those :lol

                Comment


                  #9
                  come on PerlOfWisdom - accept my challenge 8)

                  Comment


                    #10
                    I never said that my SQL would be faster than your non-standard messy code. I said that it would probably be fast enough. It's only 20,000 records in each file and often it's not worth the effort of super optimisation.

                    I recently reduced the running time of a 4 hour query to 20 minutes. The query is run daily, and the improvement saves 1 hour of on-call time per day (it shifts the start time of the whole process into the working day).

                    I could have spent weeks improving it to 5 minutes, but it wouldn't give any further benefit to an 8 hour process.

                    Comment

                    Working...
                    X