Clustered vs Non clustered indexes - Contractor UK Bulletin Board

Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:

You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.

The Wikir Man replied

18 March 2010, 11:26
Originally posted by DimPrawn View Post

I think what OH is saying, is some inexperience DBA types/developers think, "If an index on a column speeds things up, I'll stick an index on every column of every table and the DB will run like lightning".

WHS - many developers just ask them to be put on too many columns, and not enough DBAs (in my experience) question WHY they want them.
Leave a comment:
Spacecadet replied

18 March 2010, 11:24
Originally posted by OrangeHopper View Post

So the end result was that all insertions, and subsequent deletions, where running slower than necessary because of of the creation and deletion of indexes. Got rid of the indexes and things ran significantly faster.

don't forget the updations
Leave a comment:
DimPrawn replied

18 March 2010, 11:21
I think what OH is saying, is some inexperience DBA types/developers think, "If an index on a column speeds things up, I'll stick an index on every column of every table and the DB will run like lightning".

The database then has to maintain these indexes on every insert, update and delete.

The only true way to tune a database is against a representative set of captured SQL (a workload).
Leave a comment:
jmo21 replied

18 March 2010, 11:16
Originally posted by OrangeHopper View Post

So the end result was that all insertions, and subsequent deletions, where running slower than necessary because of of the creation and deletion of indexes. Got rid of the indexes and things ran significantly faster.

Inserts being a little slower obviously makes sense without an index as something extra has to happen.
Leave a comment:
threaded replied

18 March 2010, 10:29
Good point there OrangeHopper. It's an example of what I was saying: you really have to suck it and see, and rationalise after the fact. What you'd think would improve things often turns out to have quite the opposite effect.
Leave a comment:
OrangeHopper replied

17 March 2010, 20:08
Very slightly off topic but I am very sceptical about indexes. We had a database designed by a DBA with lots of indexes for this that and the other. This was an Oracle database and I read somewhere that only under certain conditions would the index be used. Did an explain plan on all the business related queries and none were utilising an index. So the end result was that all insertions, and subsequent deletions, where running slower than necessary because of of the creation and deletion of indexes. Got rid of the indexes and things ran significantly faster. Oh, and of course, the database storage requirements were also significantly reduced.
Leave a comment:
wurzel replied

17 March 2010, 10:22
Just a thought, I'm using this query to fill a dataset so am wondering if it would be more efficient to do the sort in the dataset and not in the query.
Leave a comment:
Spacecadet replied

17 March 2010, 10:07
Originally posted by wurzel View Post

Thanks. All very helpful.

Just a couple of thoughts.

1) Clustered indexes are not a good choice for Columns that undergo frequent changes:

How does this apply to surrogate key columns in which the data changes frequently but the value is always added to the end?

I presume this is a dimension table in a data warehouse... in which case the surrogate key column should be the clustered index

Originally posted by wurzel View Post

2) I have a fairly complex query that has an order by clause. When I look at the execution plan I see that this order by accounts for 42% of the query execution. The column specified in the order by doesn't have an index on it & the table has a clustered index on another completely unrelated column. Is this just bad design?

Stick an index on it and see what happens!
without seeing the schema, data and query it's difficult to give specific advice.
Leave a comment:
DimPrawn replied

17 March 2010, 10:07
Originally posted by wurzel View Post

Thanks. All very helpful.

Just a couple of thoughts.

1) Clustered indexes are not a good choice for Columns that undergo frequent changes:

How does this apply to surrogate key columns in which the data changes frequently but the value is always added to the end?

2) I have a fairly complex query that has an order by clause. When I look at the execution plan I see that this order by accounts for 42% of the query execution. The column specified in the order by doesn't have an index on it & the table has a clustered index on another completely unrelated column. Is this just bad design?

1. If the change would result in the order of the data to change, then it is not a good choice, especially as you say it changes frequently.

2. Always a good idea to tune the database with a representative workload (SQL Server Profiler) and see what can be done. One query might benefit from a change of index, but others might then perform worse, so it depends on real life workload if you gain overall.
Leave a comment:
wurzel replied

17 March 2010, 09:44
Thanks. All very helpful.

Just a couple of thoughts.

1) Clustered indexes are not a good choice for Columns that undergo frequent changes:

How does this apply to surrogate key columns in which the data changes frequently but the value is always added to the end?

2) I have a fairly complex query that has an order by clause. When I look at the execution plan I see that this order by accounts for 42% of the query execution. The column specified in the order by doesn't have an index on it & the table has a clustered index on another completely unrelated column. Is this just bad design?
Leave a comment:
doodab replied

17 March 2010, 09:32
Originally posted by MSDN

Provide a high degree of selectivity.

FTFMS

Uniqueness is not a matter of degree. Values in a column are either unique, or they are not.

Note the closest equivalent in Oracle is an index organized table. An Oracle cluster is something else, it stores corresponding rows from different tables in the same blocks, which is useful if you have tables that are frequently joined.
Leave a comment:
DimPrawn replied

17 March 2010, 09:15
Applies to MS SQL Server but probably a good general set of rules

http://msdn.microsoft.com/en-us/library/ms190639.aspx

Clustered indexes sort and store the data rows in the table based on their key values. There can only be one clustered index per table, because the data rows themselves can only be sorted in one order. For more information about clustered index architecture, see Clustered Index Structures

With few exceptions, every table should have a clustered index defined on the column, or columns, that offer the following:

Can be used for frequently used queries.

Provide a high degree of uniqueness.

Note:

When you create a PRIMARY KEY constraint, a unique index on the column, or columns, is automatically created. By default, this index is clustered; however, you can specify a nonclustered index when you create the constraint.

Can be used in range queries.

If the clustered index is not created with the UNIQUE property, the Database Engine automatically adds a 4-byte uniqueifier column to the table. When it is required, the Database Engine automatically adds a uniqueifier value to a row to make each key unique. This column and its values are used internally and cannot be seen or accessed by users.

Query Considerations:

Before you create clustered indexes, understand how your data will be accessed. Consider using a clustered index for queries that do the following:

Return a range of values by using operators such as BETWEEN, >, >=, <, and <=.

After the row with the first value is found by using the clustered index, rows with subsequent indexed values are guaranteed to be physically adjacent. For example, if a query retrieves records between a range of sales order numbers, a clustered index on the column SalesOrderNumber can quickly locate the row that contains the starting sales order number, and then retrieve all successive rows in the table until the last sales order number is reached.

Return large result sets.

Use JOIN clauses; typically these are foreign key columns.

Use ORDER BY, or GROUP BY clauses.

An index on the columns specified in the ORDER BY or GROUP BY clause may remove the need for the Database Engine to sort the data, because the rows are already sorted. This improves query performance.

Column Considerations:

Generally, you should define the clustered index key with as few columns as possible. Consider columns that have one or more of the following attributes:

Are unique or contain many distinct values

For example, an employee ID uniquely identifies employees. A clustered index or PRIMARY KEY constraint on the EmployeeID column would improve the performance of queries that search for employee information based on the employee ID number. Alternatively, a clustered index could be created on LastName, FirstName, MiddleName because employee records are frequently grouped and queried in this way, and the combination of these columns would still provide a high degree of difference.

Are accessed sequentially

For example, a product ID uniquely identifies products in the Production.Product table in the AdventureWorks database. Queries in which a sequential search is specified, such as WHERE ProductID BETWEEN 980 and 999, would benefit from a clustered index on ProductID. This is because the rows would be stored in sorted order on that key column.

Defined as IDENTITY because the column is guaranteed to be unique within the table.

Used frequently to sort the data retrieved from a table.

It can be a good idea to cluster, that is physically sort, the table on that column to save the cost of a sort operation every time the column is queried.

Clustered indexes are not a good choice for the following attributes:

Columns that undergo frequent changes:

This causes in the whole row to move, because the Database Engine must keep the data values of a row in physical order. This is an important consideration in high-volume transaction processing systems in which data is typically volatile.

Wide keys:

Wide keys are a composite of several columns or several large-size columns. The key values from the clustered index are used by all nonclustered indexes as lookup keys. Any nonclustered indexes defined on the same table will be significantly larger because the nonclustered index entries contain the clustering key and also the key columns defined for that nonclustered index.
Leave a comment:
threaded replied

16 March 2010, 17:48
A clustered index reorders the data, and therefore you can only have one that is clustered.

A non-clustered leaves the data in whatever, usually a kinda sequential, order it was created, and obviously you can have several indexes.

So if you're doing stuff that mainly runs in a stepwise order along the one true index, or groups a range of it, or wants the same sort order as the index gives you, then clustered is obviously better. So month-end reports, long running stuff, and the like.

But if you've got a table that requires lot of indexes, or the plans are looking to create indexes created on the fly, then the looseness of non-clustered will most likely be faster. So live users doing a multitude of different queries all expecting their 30s response times.

But the real answer is I find, suck it and see. What makes it work lovely during the month kills performance with the month-end reports, then you get the month-end reports running sweet and it's like a sick dog for days after for normal users. There are so many wacky things going of in systems nowadays that it appears you can only rationalise a result after the fact...
Leave a comment:
Spacecadet replied

16 March 2010, 17:45
clustered indexes work by arranging the data storage for the table in the order dictated by the index

So if you had a customer table with a clustered index on the customerid then the data would be stored on the disk in the order of the customerid

naturally it follows that you can only have 1 clustered index per table!

however, you will have other columns that need indexing. The customer table might have a column (addressid) with a key to an addresses table

when you join the customer table to the address table:
select customerid, customername, addressline1.... addresspostcode
from customer c
inner join address a
on c.addressid = a.addressid

the query will work better if the addressid column is indexed in both the customer table and the address table. Since the customer table already has a clustered index on the customerid, the second index on the addressid will have to be a regular index
Leave a comment:
TheFaQQer replied

16 March 2010, 17:25
Haven't read the whole thread, but asktom.oracle.com has some posts about them:

http://asktom.oracle.com/pls/asktom/...D:586423377841
Leave a comment: