26
Mar

Super Fast Loading of Data into SQL Server tables

Whenever you INSERT data into SQL tables, as each row is added, it also updates the index data on a row by row basis.  So a table with 15 indexes will actually update 16 rows (1 data and 15 index rows) for each insert.

Where you are inserting thousands of rows, this can make for MASSIVE log files and LONG processing times.

So, to make insertion super fast, always a good idea to DISABLE non-clustered indexes (those indexes which are not setting the primary keys on the table) as this will stop the log file being built with the index update notes, and reduces disk IO.  Once the insert is complete, just re-enable them and SQL will rebuild the indexes in the background.  This is useful in a stored procedure which say runs every night, it makes loads much faster and with less processing on the server.

----Disable Index (example is an index called IX_StoreContact_ContactTypeID on a table called StoreContactin a database called Sales)
ALTER INDEX [IX_StoreContact_ContactTypeID] ON Sales.StoreContact DISABLE
GO
----Enable Index
ALTER INDEX [IX_StoreContact_ContactTypeID] ON Sales.StoreContact REBUILD
GO

For SQL 2008, it is also good to LOCK the table, as this will mean the log file does not need to be updated…

If using SQL Server 2008, there is another useful trick which bypasses the need to turn off the indexes.
Where an INSERT statement that takes its rows from a SELECT operation (from another table, database, etc) and inserts them into a heap is being used, we can tell SQL Server to remove all indexing and logging until AFTER the insert is complete.   We can do this via a WITH (TABLOCK) hint is used on the destination table
The following syntax should be used to achieve minimal logging into heaps.

INSERT INTO <DestinationTable> WITH (TABLOCK) (<Columns>)
SELECT <Columns> FROM <SomeStatement>

However, this will lock the table – nobody else will be able to read or write to the table whilst the insert is running.  But, for large data inserts, this processing is AT LEAST 70% faster than the same insert within the (TABLOCK) hint – as no index or transaction logging is performed (therefore a massive reduction in IO) until the insert is complete.
It saved my bacon today  :-)

1 Trackback or Pingback for this entry