• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

Deleting records SQL

Collapse
X
  •  
  • Filter
  • Time
  • Show
Clear All
new posts

    #11
    > I never said that my SQL would be faster than your non-
    > standard messy code.

    :rollin

    3 lines of Transact SQL is messy and non-standard? its not like I am suggesting to do 30-50 lines - there is no reason why not to do it. anyhow, maybe if you did enough optimisations you'd learn that optimised code is non-standard by definition - its _optimised_

    > I recently reduced the running time of a 4 hour query to 20
    > minutes.

    great, and I've recently improved query from 3 sec to 0.1 sec - tables with 5 mln rows involved. care to offer your solution to effective substring searches? ie optimise the following:

    create table Products (SKU int, Keywords varchar(255)

    select * from Products where Keywords like '%keyword1%' and Keywords like '%keyword2%'

    table scans are pretty expensive for table with 100k's products

    going to present this to local Perl Mongers group btw.

    > I could have spent weeks improving it to 5 minutes, but it
    > wouldn't give any further benefit to an 8 hour process.

    agree in principle - but in this case fast optimisation was not big work at all -- the mani difference is in understanding what exactly database is doing in order to execute "not in" query, its that knowledge that allows to optimise code not just slapping index in hope it would improve performance -- it will but only finitely.

    Comment


      #12
      This should speed it up a bit.

      create table WordLookup (Keyword varchar(31), SKU int);
      /*
      ** Populate it with every word in Products and its SKU, using some code, depending on the DBMS.
      ** This can be done using triggers and stored procedures to keep it up-to-date.
      */


      /* if you need both SKU and Keywords */

      select t1.SKU, Keywords
      from WordLookup t1, WordLookup t2, Products
      where t1.SKU = t2.SKU
      and Products.SKU = t1.SKU
      and t1.Keyword = 'keyword1'
      and t2.Keyword = 'keyword2';


      /* if you just need SKU */

      select t1.SKU
      from WordLookup t1, WordLookup t2
      where t1.SKU = t2.SKU
      and t1.Keyword = 'keyword1'
      and t2.Keyword = 'keyword2';



      Is there a better way, or do I win the £5 record token?

      Comment


        #13
        nice try

        have you read my text?

        > care to offer your solution to effective substring
        > searches?


        SubStrings means when I search for "Look" I want to find not just "Look" but also "Lookup" and "WordLookup". Your code does not cater for that at all - you restrict engine to exact words which is easy to code but does not solve problem.

        another try maybe?

        I do not delude myself with level fo my english, so if you misunderstood or still do not understand task then please feel free to ask questions.

        You also seem to want to have as many WordLookup tables as you can have keywords, for product searches it is not unusual to have 5-7-10 keywords, do you intend to have 10 tables with same data in them?

        In my ex job where I designed search engine we had 150k products with 15-20 keywords each. If we split them into individual keywords then we'd have WordLookup table with 3mln rows. Now, if you use "like '%keyword%'" approach then you'd force database to table scan which kills performance. This whole task is about eliminating table scan. Before you rush to suggest "full text index" please note that some big databases would still not use it if you use % sign before keyword - using it after ie keyword% is easy since existing indecies can work.

        my approach was not the most simplistic - its not just one query and it requires pre-processing of data that I did using Perl but it gives really good results - after it implementation database load dropped 4 times on our website that was pretty well visited.

        > Is there a better way, or do I win the £5 record token?

        you will have to try harder - you did not even solve the problem, my way was pretty unique (have not seen it anywhere else), best of all it was much more scalable than anything else - am putting article together on how to do it, think this time i might implement it using .NET + SQL

        Comment


          #14
          Why would it need more than one lookup table? Have you ever heard of self-joins.

          I know you said substring searches, but in practice, you would probably be looking for whole words - you wouldn't search for 'omputer' for example.

          God knows what kind of solution you came up with, but I bet it was non-standard and only worked on your specific system. I remember one scheme you came up with where you encoded postcodes into binary to save 4 bytes or something.

          Anyway, keep up the good work - you'll soon have a job for life as no-one else will be able to understand any of your systems.

          Comment


            #15
            > Why would it need more than one lookup table? Have you
            > ever heard of self-joins.

            oops, your alises thrown me off a bit - apologies for that now I see you used same table - no questions about that.

            > I know you said substring searches, but in practice, you
            > would probably be looking for whole words - you wouldn't
            > search for 'omputer' for example.

            the task was very specificly defined as providing substring search. you can't do that effectively and try to modify original task to get away with simple query that any student can come up with.

            In practical terms substring searches are necessary to improve search experience on consumer based websites - I know I analysed data for what used to be top 10 UK ecommerce website - people do input crap data and if they fail to find stuff quick they bog off elsewhere. People are so used to Google these days that they have very high expectations of search engines on sites and trust me - I looked at the very low level at customer conversions - improvement in search does increase sales considerably. Hence requirement to do substring searches. Add to this that data quality for 100k's of products is often low - even from expensive content providers and that necessitates substring searching even more.

            > you wouldn't search for 'omputer' for example.

            I would search for CD-RW and expect it to find CDRW, CD RW and CD-RW. I would search for "computer" and expect it to find "computer" as well as "computers" etc.

            > God knows what kind of solution you came up with, but I
            > bet it was non-standard and only worked on your specific
            > system.

            It worked on Sybase, SQL Server (MSFT) and DB2 (ported within days). Specifically our overall database load dropped 4 times - this put us in a position to scale down on hardware resources and cut costs from £500k pa to £50k - a big gain in my book.

            > I remember one scheme you came up with where
            > you encoded postcodes into binary to save 4 bytes or
            > something.

            I have got very good experience in optimisations - I used to program in assembly and I know value of each byte. My suggestion would have worked perfectly for huge datasets with 100s of millions of rows - this is enterprise level tasks not jokes Access-like "databases". I had to deal with 1GB of data collected daily and I was into analyses of months of data using very non-trivial queries - 4 bytes savings like above cut down time on queries by many hours.

            > Anyway, keep up the good work - you'll soon have a job for
            > life as no-one else will be able to understand any of your
            > systems.

            thanks, even tho i sense sarcasm there.I can assure you that my work was understood by some bright individuals


            p.s. Now it would only fair to offer solution to problem, feel free to show me sources where they did it the same:

            key idea is to avoid queries with % sign before keyword - this forces db to ignore index. So, we do split all words in the following manner:

            computer
            omputer
            mputer
            puter
            uter
            ter

            this partial words would point to actual SKU you looking after so query would be to identify all split words and retrieve unique SKUs - easier said than done but key principle that allows to avoid table scans yet search for substrings is as above. Add to this that this approach enables to do easy automatic spell-checking ala Google without getting humans involved.

            Comment


              #16
              I must admit I wouldn't have thought of that scheme.

              Did you

              a) split each value in the Keywords field into (up to) 254 substrings and put in a lookup table,
              or
              b) split the keyword1 and keyword2 into a number of substrings and run several queries?


              On second thoughts, b) wouldn't work as it could only find the first word in the Keywords field without the leading %.

              Comment


                #17
                yeah its pretty unique - i have not seen it elsewhere and after I looked at techniques people used to do "full text indexes" I found that they still dont allow true substring search, ie they would allow to append % at the end of word but not in front of it.

                > a) split each value in the Keywords field into (up to) 254
                > substrings and put in a lookup table,

                statistics is powerful thing - have you thought that total size of good english dictionary is less than 60k words? Then think of average word length and realise that it is reasonable to expect that people would not search for individual letters or even 2 or 3 char words unless they are FULL words, so you only split anything bigger than 3 chars - create table with unique split parts and their frequences this would be very small, we had less than 250k unique word parts from big data set. More to the point - words in English language are pretty small, 4-5 is avg, so you only will have 3-4-7 parts for each word - thats not a lot.

                > b) split the keyword1 and keyword2 into a number of
                > substrings and run several queries?

                once you split whole lot you will have Keywords table that you
                can query for keys using normal notation "like 'keyword%' - note that you only split words from FRONT, no need to do it from the back because using % at the end of word is okay for indexes - its still fast. So, you can query one keywords table to identify unique number assigned to them using normal query methods - its very fast, then you just query in big Stock Codes table using identified numbers as exact match - use clustered index and its a breath. Remember my 4 bytes saving? Well, it helps a lot in this sort of situations, especially when you can fit things into tight 64bit data-types when you run on 64bit boxes.

                There is sure a lot more to that to make it really good, too long to cover here, but you get the idea - this algorithm, albeit not wel known generates really good savings - as I said our overall database load dropped from 60% to 15% and whole website had only SEARCH engine optimised.

                Comment


                  #18
                  My apologies for not putting it in plain compehensive english - I've been working far too much on big project at work recently

                  Comment


                    #19
                    surely....

                    delete tblThis
                    from tblthis left outer join tblthat
                    on tblthis.clientid = tblthat.clientid
                    where tblthat.clientid = null

                    or something similar?

                    Comment


                      #20
                      Re: surely....

                      well that should delete rows with clientid=NULL, too tired to try it out at home, gotta try at work, not particularly into outer joins perhaps I am missing out something???

                      Comment

                      Working...
                      X