• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

Clustered vs Non clustered indexes

Collapse
X
  •  
  • Filter
  • Time
  • Show
Clear All
new posts

    Clustered vs Non clustered indexes

    I don't do a huge amount of database work but I'd like to get to grips with all this clustered vs non clustered index business & when I should use one over the other. There seems to be a lot of conflicting information out there; e.g. in here they say "Non-clustered indexes are best for queries that use JOINs although clustered indexes are better if they can be used" WTF??

    Can anyone recommend any resources that explain indexes clearly particularly wrt when to use one over the other?

    #2
    Haven't read the whole thread, but asktom.oracle.com has some posts about them:

    http://asktom.oracle.com/pls/asktom/...D:586423377841
    Best Forum Advisor 2014
    Work in the public sector? You can read my FAQ here
    Click here to get 15% off your first year's IPSE membership

    Comment


      #3
      clustered indexes work by arranging the data storage for the table in the order dictated by the index

      So if you had a customer table with a clustered index on the customerid then the data would be stored on the disk in the order of the customerid

      naturally it follows that you can only have 1 clustered index per table!

      however, you will have other columns that need indexing. The customer table might have a column (addressid) with a key to an addresses table

      when you join the customer table to the address table:
      select customerid, customername, addressline1.... addresspostcode
      from customer c
      inner join address a
      on c.addressid = a.addressid

      the query will work better if the addressid column is indexed in both the customer table and the address table. Since the customer table already has a clustered index on the customerid, the second index on the addressid will have to be a regular index
      Coffee's for closers

      Comment


        #4
        A clustered index reorders the data, and therefore you can only have one that is clustered.

        A non-clustered leaves the data in whatever, usually a kinda sequential, order it was created, and obviously you can have several indexes.

        So if you're doing stuff that mainly runs in a stepwise order along the one true index, or groups a range of it, or wants the same sort order as the index gives you, then clustered is obviously better. So month-end reports, long running stuff, and the like.

        But if you've got a table that requires lot of indexes, or the plans are looking to create indexes created on the fly, then the looseness of non-clustered will most likely be faster. So live users doing a multitude of different queries all expecting their 30s response times.

        But the real answer is I find, suck it and see. What makes it work lovely during the month kills performance with the month-end reports, then you get the month-end reports running sweet and it's like a sick dog for days after for normal users. There are so many wacky things going of in systems nowadays that it appears you can only rationalise a result after the fact...
        Insanity: repeating the same actions, but expecting different results.
        threadeds website, and here's my blog.

        Comment


          #5
          Applies to MS SQL Server but probably a good general set of rules

          http://msdn.microsoft.com/en-us/library/ms190639.aspx

          Clustered indexes sort and store the data rows in the table based on their key values. There can only be one clustered index per table, because the data rows themselves can only be sorted in one order. For more information about clustered index architecture, see Clustered Index Structures

          With few exceptions, every table should have a clustered index defined on the column, or columns, that offer the following:

          Can be used for frequently used queries.

          Provide a high degree of uniqueness.

          Note:

          When you create a PRIMARY KEY constraint, a unique index on the column, or columns, is automatically created. By default, this index is clustered; however, you can specify a nonclustered index when you create the constraint.

          Can be used in range queries.

          If the clustered index is not created with the UNIQUE property, the Database Engine automatically adds a 4-byte uniqueifier column to the table. When it is required, the Database Engine automatically adds a uniqueifier value to a row to make each key unique. This column and its values are used internally and cannot be seen or accessed by users.

          Query Considerations:

          Before you create clustered indexes, understand how your data will be accessed. Consider using a clustered index for queries that do the following:

          Return a range of values by using operators such as BETWEEN, >, >=, <, and <=.

          After the row with the first value is found by using the clustered index, rows with subsequent indexed values are guaranteed to be physically adjacent. For example, if a query retrieves records between a range of sales order numbers, a clustered index on the column SalesOrderNumber can quickly locate the row that contains the starting sales order number, and then retrieve all successive rows in the table until the last sales order number is reached.

          Return large result sets.

          Use JOIN clauses; typically these are foreign key columns.

          Use ORDER BY, or GROUP BY clauses.

          An index on the columns specified in the ORDER BY or GROUP BY clause may remove the need for the Database Engine to sort the data, because the rows are already sorted. This improves query performance.


          Column Considerations:

          Generally, you should define the clustered index key with as few columns as possible. Consider columns that have one or more of the following attributes:

          Are unique or contain many distinct values

          For example, an employee ID uniquely identifies employees. A clustered index or PRIMARY KEY constraint on the EmployeeID column would improve the performance of queries that search for employee information based on the employee ID number. Alternatively, a clustered index could be created on LastName, FirstName, MiddleName because employee records are frequently grouped and queried in this way, and the combination of these columns would still provide a high degree of difference.

          Are accessed sequentially

          For example, a product ID uniquely identifies products in the Production.Product table in the AdventureWorks database. Queries in which a sequential search is specified, such as WHERE ProductID BETWEEN 980 and 999, would benefit from a clustered index on ProductID. This is because the rows would be stored in sorted order on that key column.

          Defined as IDENTITY because the column is guaranteed to be unique within the table.

          Used frequently to sort the data retrieved from a table.

          It can be a good idea to cluster, that is physically sort, the table on that column to save the cost of a sort operation every time the column is queried.



          Clustered indexes are not a good choice for the following attributes:

          Columns that undergo frequent changes:

          This causes in the whole row to move, because the Database Engine must keep the data values of a row in physical order. This is an important consideration in high-volume transaction processing systems in which data is typically volatile.

          Wide keys:

          Wide keys are a composite of several columns or several large-size columns. The key values from the clustered index are used by all nonclustered indexes as lookup keys. Any nonclustered indexes defined on the same table will be significantly larger because the nonclustered index entries contain the clustering key and also the key columns defined for that nonclustered index.

          Comment


            #6
            Originally posted by MSDN

            Provide a high degree of selectivity.
            FTFMS

            Uniqueness is not a matter of degree. Values in a column are either unique, or they are not.

            Note the closest equivalent in Oracle is an index organized table. An Oracle cluster is something else, it stores corresponding rows from different tables in the same blocks, which is useful if you have tables that are frequently joined.
            While you're waiting, read the free novel we sent you. It's a Spanish story about a guy named 'Manual.'

            Comment


              #7
              Thanks. All very helpful.

              Just a couple of thoughts.

              1) Clustered indexes are not a good choice for Columns that undergo frequent changes:

              How does this apply to surrogate key columns in which the data changes frequently but the value is always added to the end?

              2) I have a fairly complex query that has an order by clause. When I look at the execution plan I see that this order by accounts for 42% of the query execution. The column specified in the order by doesn't have an index on it & the table has a clustered index on another completely unrelated column. Is this just bad design?

              Comment


                #8
                Originally posted by wurzel View Post
                Thanks. All very helpful.

                Just a couple of thoughts.

                1) Clustered indexes are not a good choice for Columns that undergo frequent changes:

                How does this apply to surrogate key columns in which the data changes frequently but the value is always added to the end?

                2) I have a fairly complex query that has an order by clause. When I look at the execution plan I see that this order by accounts for 42% of the query execution. The column specified in the order by doesn't have an index on it & the table has a clustered index on another completely unrelated column. Is this just bad design?

                1. If the change would result in the order of the data to change, then it is not a good choice, especially as you say it changes frequently.

                2. Always a good idea to tune the database with a representative workload (SQL Server Profiler) and see what can be done. One query might benefit from a change of index, but others might then perform worse, so it depends on real life workload if you gain overall.

                Comment


                  #9
                  Originally posted by wurzel View Post
                  Thanks. All very helpful.

                  Just a couple of thoughts.

                  1) Clustered indexes are not a good choice for Columns that undergo frequent changes:

                  How does this apply to surrogate key columns in which the data changes frequently but the value is always added to the end?
                  I presume this is a dimension table in a data warehouse... in which case the surrogate key column should be the clustered index

                  Originally posted by wurzel View Post
                  2) I have a fairly complex query that has an order by clause. When I look at the execution plan I see that this order by accounts for 42% of the query execution. The column specified in the order by doesn't have an index on it & the table has a clustered index on another completely unrelated column. Is this just bad design?
                  Stick an index on it and see what happens!
                  without seeing the schema, data and query it's difficult to give specific advice.
                  Coffee's for closers

                  Comment


                    #10
                    Just a thought, I'm using this query to fill a dataset so am wondering if it would be more efficient to do the sort in the dataset and not in the query.

                    Comment

                    Working...
                    X