Sunday, 23 February 2014

Index Fundamentals Part 3

Today we have a look at the attributes for clustered indexes.
What is a good choice for a clustered index?
The SQL Server decides nearly always the primary key for a clustered index why does the server do that?

For choosing the perfect column for a clustered index the attribute have to be

Unique - If a column gets duplicated entries the Server has to add a unique identifier to the 
                duplicated entries this unique identifier costs additional space in the 8 K pages and
                expands the whole Index
                For example (in the example we think there is no metaspace on each page for simple 
                calculation) 
                we got a table with 2048 entrys from datatype int (4 byte per row) they need 
                1 x 8 K page in the index (4*2048 = 8196 = 8 K).
                So our Table needs 1 page to store now we suggest the whole column is not unique there is
                in every row the number 5 in the int field. Now the Server has to add to every int a unique
                identifier (16 bytes) for storing so now each row needs 20 bytes (4 int + 16 unique 
                identifier) now we need 5 pages (20 * 2048 / 8196 = 4,99)
                So in this example the index growths by 5 because of duplicated entries.
Static -    The indexed column has to be static because the index is build up ascending or 
                descending and a update on the field destroys the perfect sorting.
                For example our table looks like:
                ID;Text
                1;Text1
                2;Text2
                3;Text3
                4;Text4
                5;Text5
                Now we execute "update table set ID = 888 where ID = 2"
                On this update the Server has to restore the old ID 2 row to a new page between 887 and 
                889 and there is a hole by ID = 2
                At this point we got a Fragmentation in the index and in worst case a page split on the 
                page with 887.
Narrow - The datatype for the key of a clustered index has to be smart because it must be stored as 
                 smart as possible. The Key is stored in every nonclustered index too as a pointer in the 
                 clustered key. So it is much better to have a int (4 bytes) column instead of a unique 
                 identifier (16 bytes).
Increase - If the column for the clustered index is increasing with every insert the server has to add 
                 new entries every time to the last page. This way there is no fragmentation in the index 
                 and it is as smart as possible.

If your column fits all this criteria it is the best choice for the clustered index. Unique Identifiers are not good for a clustered index because they need much storage, are not increasing and not narrow.
A int column that is increasing by a fix incremental is a better choice.

In my next post we have a look at nonclustered indexes.

Stay tuned and have a nice day.

No comments:

Post a Comment