Into the Microsoft SQL Server: Index Fundamentals Part 3

Today we have a look at the attributes for clustered indexes.

What is a good choice for a clustered index?

The SQL Server decides nearly always the primary key for a clustered index why does the server do that?

For choosing the perfect column for a clustered index the attribute have to be

Unique - If a column gets duplicated entries the Server has to add a unique identifier to the

duplicated entries this unique identifier costs additional space in the 8 K pages and

expands the whole Index

For example (in the example we think there is no metaspace on each page for simple

calculation)

we got a table with 2048 entrys from datatype int (4 byte per row) they need

1 x 8 K page in the index (4*2048 = 8196 = 8 K).

So our Table needs 1 page to store now we suggest the whole column is not unique there is

in every row the number 5 in the int field. Now the Server has to add to every int a unique

identifier (16 bytes) for storing so now each row needs 20 bytes (4 int + 16 unique

identifier) now we need 5 pages (20 * 2048 / 8196 = 4,99)

So in this example the index growths by 5 because of duplicated entries.

Static - The indexed column has to be static because the index is build up ascending or

descending and a update on the field destroys the perfect sorting.

For example our table looks like:

ID;Text

1;Text1

2;Text2

3;Text3

4;Text4

5;Text5

Now we execute "update table set ID = 888 where ID = 2"

On this update the Server has to restore the old ID 2 row to a new page between 887 and

889 and there is a hole by ID = 2

At this point we got a Fragmentation in the index and in worst case a page split on the

page with 887.

Narrow - The datatype for the key of a clustered index has to be smart because it must be stored as

smart as possible. The Key is stored in every nonclustered index too as a pointer in the

clustered key. So it is much better to have a int (4 bytes) column instead of a unique

identifier (16 bytes).

Increase - If the column for the clustered index is increasing with every insert the server has to add

new entries every time to the last page. This way there is no fragmentation in the index

and it is as smart as possible.

If your column fits all this criteria it is the best choice for the clustered index. Unique Identifiers are not good for a clustered index because they need much storage, are not increasing and not narrow.

A int column that is increasing by a fix incremental is a better choice.

In my next post we have a look at nonclustered indexes.

Stay tuned and have a nice day.

Into the Microsoft SQL Server

Pages

Sunday, 23 February 2014

Index Fundamentals Part 3

No comments:

Post a Comment