Thursday, 27 February 2014

Index Fundamentals Part 5

Today we have a look at index maintenance. Indexes get fragmented over the time by inserting, deleting and updating indexed rows.
To handle this fragmentation indexes have to be reorganized/rebuilded regularly.
What is the difference between rebuild and reorganize?

Rebuild:
- Rebuild creates the index completly new (single threaded in Standard Edition multithreaded in Enterpriese Edition)
- Rebuild can be done online in Enterprise Edition (not on tables with large object types) but needs a Schema lock at the start that can be a problem on high frequented tables
- In Rebuild you can set a new fillfactor
- Rebuild is a transaction if you cancel it, there is a rollback and it's done offline

Reorganize:
- Reorganize is allways online
- Reorganize can be canceled without a rollback
- Reorganize doesn't need a schema lock there is no problem on high frequented tables

How can I check the fragmentation of an Index?
For checking the percentage of fragmentation use the following script from microsoft:
Microsoft Index Checkscript

Rebuild or Reorganize?
Microsoft Best practice says reorganize for fragmentation <= 30% and rebuild for fragmentation > 30% but in 24/7 environment there is the problem that a rebuild on the Standard Edition is offline and the index is not available. There can be great performance impacts for that issue. So if you have a environment with a maintenance window you can surely rebuild your indexes. For big tables and often used indexes you should use reoarganize.

In the next post I will show you a maintenance script for indexes.

Tuesday, 25 February 2014

Adding leading zeros to int column

Hi there,

some of my imports needs by design for the importcounter not integers like 1, 2, 3, 4, 5,...
they expect varchar with leading zeros. Today I will show you how to add leading zeros
to your integers by a simple combination of cast() and right().





















































Let me explain the function:
RIGHT('0000000000000000000000'+cast([Itemno] as varchar (255)), (select cast(LEN(max(itemno)) as int) from [article])) as ImportID

First of all
'0000000000000000000000'+cast([Itemno] as varchar (255)
this adds leading zeros before the int field which we cast to varchar(255)
after that we use the right function to take the first X chars from the right side of string.
RIGHT('0000000000000000000000'+cast([Itemno] as varchar (255)), X)
but my imports need as input the a variable lenght addicted by the lenght of maximum int value,
so I replaced the X with select cast(LEN(max(itemno)) as int) from [article] this query returns for the demodata 5 because the max number used is 50000 and len of 50000 = 5.
This way i got my importers working.

Copyable Version:
-- Table with int values
SELECT    [Itemno] ImportID,
        [Itemname],
        [Price]
  FROM [article]

-- Table with leading zeros as varchar
SELECT    RIGHT('0000000000000000000000'+cast([Itemno] as varchar (255)),
        (select cast(LEN(max(itemno)) as int) from [article])) as ImportID,
        [Itemname],
        [Price]
  FROM [article]

Index Fundamentals Part 4

Today we have the focus on nonclustered indexes.
Nonclustered indexes doesn't modify the logical order of tabledata only clustered index does this.
They are working with pointers to the data in a clustered index or the table (heap).

For example we got a table with:
ID;Text;Timestamp
1;Test;01-01-2000 8:50:10:001
2;Test;02-01-2000 8:50:10:001
3;Test;03-01-2000 8:50:10:001
4;Test;04-01-2000 8:50:10:001
5;Test;05-01-2000 8:50:10:001
6;Test;06-01-2000 8:50:10:001
7;Test;07-01-2000 8:50:10:001
8;Test;08-01-2000 8:50:10:001
9;Test;09-01-2000 8:50:10:001
10;Test;10-01-2000 8:50:10:001
11;Test;11-01-2000 8:50:10:001
12;Test;12-01-2000 8:50:10:001
13;Test;13-01-2000 8:50:10:001
14;Test;14-01-2000 8:50:10:001
15;Test;15-01-2000 8:50:10:001
16;Test;16-01-2000 8:50:10:001
17;Test;17-01-2000 8:50:10:001
18;Test;18-01-2000 8:50:10:001
19;Test;19-01-2000 8:50:10:001
20;Test;20-01-2000 8:50:10:001

No we create a nonclustered index on Timestamp.
The pages look like
ROOT PAGE
Timestamp;Pointer to Page
01-01-2000 8:50:10:001;1
06-01-2000 8:50:10:001;2
11-01-2000 8:50:10:001;3
16-01-2000 8:50:10:001;4

PAGE 1
Timestamp;Pointer
01-01-2000 8:50:10:001;1
02-01-2000 8:50:10:001;2
03-01-2000 8:50:10:001;3
04-01-2000 8:50:10:001;4
05-01-2000 8:50:10:001;5

PAGE 2
Timestamp;Pointer
06-01-2000 8:50:10:001;6
07-01-2000 8:50:10:001;7
08-01-2000 8:50:10:001;8
09-01-2000 8:50:10:001;9
10-01-2000 8:50:10:001;10

PAGE 3
Timestamp;Pointer
11-01-2000 8:50:10:001;11
12-01-2000 8:50:10:001;12
13-01-2000 8:50:10:001;13
14-01-2000 8:50:10:001;14
15-01-2000 8:50:10:001;15

PAGE 4
Timestamp;Pointer
16-01-2000 8:50:10:001;16
17-01-2000 8:50:10:001;17
18-01-2000 8:50:10:001;18
19-01-2000 8:50:10:001;19
20-01-2000 8:50:10:001;20

After that we execute
select Text, Timestamp from table where Timestamp = '17-01-2000 8:50:10:001'
SQL Server uses the clustered index, rootpage points to Intermediate Page 4 and Intermediate Page 4 points to Pointer 16 in clustered index or heap and the result retrieves.

As you can see in my example the ID for the clustered index is int. Let us imagine it was a unique identifier
ID 1 int and a unique identifier ID 1C3C7DEA-D440-45C4-923B-6FF988957A41 you can see the difference the storage needed for a unique identifier is much more as for int.
So if you have a unique identifier in clustered index as index column all your nonclustered indexes on the same table use it as pointer and needs lot more space in storage.
To keep your indexes smart and fast you have to use a smart column for clustered index column.

In the next part we have a look at index maintenance.

Monday, 24 February 2014

Script to check free Space in Files

Hi there,

I want to share a smart little script for checking free space in the files of a database.
See the script below:










Copyable Version:
SELECT
    a.FILEID,
    CONVERT(decimal(12,2),ROUND(a.size/128.000,2)) as [Filesize_MB] ,
    CONVERT(decimal(12,2),ROUND(fileproperty(a.name,'SpaceUsed')/128.000,2)) as [Used_Space_MB],
    CONVERT(decimal(12,2),ROUND((a.size-fileproperty(a.name,'SpaceUsed'))/128.000,2)) as [Free_Space_MB],
    a.name as [Database_Name],
    a.FILENAME as [File_Name]

FROM
    dbo.sysfiles a

Check Time till the End of Restore

Hi there,

today i want to share a script with you. You know the problem there is a big database restore and you want to know the estatimated finishtime.
With this script you can see the estatimated finishtime:

Copyable Version:
SELECT    r.session_id,
        r.command,
        CONVERT(NUMERIC(6,2),r.percent_complete) AS [Percent_from_Restore_Complete],
        CONVERT(VARCHAR(20),DATEADD(ms,r.estimated_completion_time,GetDate()),20) AS [Estatimated_Completion_Time],
        CONVERT(NUMERIC(10,2),r.total_elapsed_time/1000.0/60.0) AS [Elapsed_Min],
        CONVERT(NUMERIC(10,2),r.estimated_completion_time/1000.0/60.0) AS [Estatimated_Min],
        CONVERT(NUMERIC(10,2),r.estimated_completion_time/1000.0/60.0/60.0) AS [Estatimated_Hours],
        CONVERT(VARCHAR(1000),
        (    SELECT SUBSTRING(text,r.statement_start_offset/2, CASE WHEN r.statement_end_offset = -1
            THEN 1000
            ELSE (r.statement_end_offset-r.statement_start_offset)/2 END)
FROM sys.dm_exec_sql_text(sql_handle)))
FROM sys.dm_exec_requests r WHERE command IN ('RESTORE DATABASE','BACKUP DATABASE')

Sunday, 23 February 2014

Index Fundamentals Part 3

Today we have a look at the attributes for clustered indexes.
What is a good choice for a clustered index?
The SQL Server decides nearly always the primary key for a clustered index why does the server do that?

For choosing the perfect column for a clustered index the attribute have to be

Unique - If a column gets duplicated entries the Server has to add a unique identifier to the 
                duplicated entries this unique identifier costs additional space in the 8 K pages and
                expands the whole Index
                For example (in the example we think there is no metaspace on each page for simple 
                calculation) 
                we got a table with 2048 entrys from datatype int (4 byte per row) they need 
                1 x 8 K page in the index (4*2048 = 8196 = 8 K).
                So our Table needs 1 page to store now we suggest the whole column is not unique there is
                in every row the number 5 in the int field. Now the Server has to add to every int a unique
                identifier (16 bytes) for storing so now each row needs 20 bytes (4 int + 16 unique 
                identifier) now we need 5 pages (20 * 2048 / 8196 = 4,99)
                So in this example the index growths by 5 because of duplicated entries.
Static -    The indexed column has to be static because the index is build up ascending or 
                descending and a update on the field destroys the perfect sorting.
                For example our table looks like:
                ID;Text
                1;Text1
                2;Text2
                3;Text3
                4;Text4
                5;Text5
                Now we execute "update table set ID = 888 where ID = 2"
                On this update the Server has to restore the old ID 2 row to a new page between 887 and 
                889 and there is a hole by ID = 2
                At this point we got a Fragmentation in the index and in worst case a page split on the 
                page with 887.
Narrow - The datatype for the key of a clustered index has to be smart because it must be stored as 
                 smart as possible. The Key is stored in every nonclustered index too as a pointer in the 
                 clustered key. So it is much better to have a int (4 bytes) column instead of a unique 
                 identifier (16 bytes).
Increase - If the column for the clustered index is increasing with every insert the server has to add 
                 new entries every time to the last page. This way there is no fragmentation in the index 
                 and it is as smart as possible.

If your column fits all this criteria it is the best choice for the clustered index. Unique Identifiers are not good for a clustered index because they need much storage, are not increasing and not narrow.
A int column that is increasing by a fix incremental is a better choice.

In my next post we have a look at nonclustered indexes.

Stay tuned and have a nice day.

Saturday, 22 February 2014

Index Fundamentals Part 2

Today I want to show you the difference between a clustered and a nonclustered index.
There can only be one clustered index on a table but 999 nonclustered indexes. 
A clustered index reorganizes the whole table according to the index. See in Example:

Table before clustered Index:
ID;Country
1;Brasil
2;Germany
3;America

Now we create a clustered index on Country (asc) the new logical table order is:
ID;Country
3;America
1;Brasil
2;Germany

Lets see the same Index on a nonclustered index:
Table before nonclustered Index:
ID;Country
1;Brasil
2;Germany
3;America

Table after nonclustered Index:
ID;Country
1;Brasil
2;Germany
3;America

As you can see there is no difference and this is the point between nonclustered and clustered indexes, 
the clustered index reorganizes the table in the filesystem. A nonclustered Index doesn’t do that.

In the next Topic we have a look at what attributes are the best for my index.