Databases, SQL ServerJuly 13, 2018July 13, 2018No Comment

Please stop asking me to shrink the data files

Being a database administrator, I’ve probably heard the request, ‘We’re out of space. Please shrink the data files’, about a thousand times from my fellow developer colleagues and suppliers alike. It seems like the ultimate quick-fix solution to storage issues. After all, shrinking the database file would mean that the total free space overall would be available to any database on the server. However, although it sounds ideal, shrinking the data files can only cause problems in the future, leading to be more of a problem itself than an actual solution.

Myths about shrinking data files

Shrinking data files helps solve performance issues
Shrinking data files will mean shorter backup times
Shrinking data files is a long-term solution
If Microsoft didn’t want us to use it, why would they have included the feature?
Shrinking and TRUNCATEONLY are the same thing

So, what actually happens when you shrink a data file?

When the shrink function is executed, all pages within the data file are moved from the back of the data file to the available slots until all empty slots are moved to the back. This causes index fragmentation.

To explain this clearer, we will take an example. In the below diagram we have an image of a data file, split into slots awaiting pages to be created and stored within the data file. For clarity, we have labeled the pages with capital letters.

Data file:

The capital letters represent different pages within the data file

After shrink, the data file appears as follows,

Data file:

Page G has moved to the third position. As you can see this would cause huge performance overload due to index fragmentation.

How to check for index fragmentation

In order to check for average fragmentation (as a percentage value) would be to execute a script which queries sys.dm_db_index_physical_stats. This returns the fragmentation information for the tables in the databases.

If you require to obtain information about the average fragmentation size for each table in each database, the following query can be used,

SELECT DB_NAME() AS DatabaseName, Object_name(a.object_id) AS TableName, a.index_id, name AS IndexName, avg_fragmentation_in_percent

FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, ‘Limited’) AS ips

INNER JOIN sys.indexes AS a ON ips.object_id = a.object_id AND ips.index_id = a.index_id

ORDER BY avg_fragmentation_in_percent desc

(sys.dm_db_index_physical_stats Query, 2011)

Debunking myths

As mentioned above, a common myth is that shrinking data files actually helps performance. This is not true and as explained above, can actually have the opposite affect and create large performance overhead. Shrinking a data file consumes a lot of resources such as I/O, CPU and buffer pool resources. Although shrinking the log files does not consume a lot of resources, due to zero initialization it should be avoided as well.

Brent Ozar, a seasoned SQL Server Database expert who holds the Microsoft Certified Master of SQL Server certification under his belt, has tested out the shrink functionality to ensure that it does actually hinder performance. In his article, he goes through the process of every step he took to ensure that it is a valid test. (Ozar, 2017) From his findings, it is clear that shrinking the data files triggers high fragmentation, which he attempted to fix by rebuilding the indexes. Instead of fixing the problem, rebuilding the indexes forced the data files to grow and thus, he ended up no better than he started off.

Paul Randal, CEO of SQLSkills, was the owner for the shrink code (he makes it clear that he didn’t write the code and was just the owner of the code). He explains in detail why shrinking the data files is not the solution, neither short term nor long term. Paul Randal explains that the only reason the functionality has not been removed from future editions of the SQL Server product is for backward compatibility. (Randal, Why you should not shrink your data files, 2009)

There’s a common misconception that shrinking the data files would actually lessen the time it takes to take the backup. Since shrinking does nothing else than move pages within the data file, the backup time would not lessen. If you want to affect lessen backup time, the TRUNCATEOPTION would be a better idea. (Randal, Conference Questions Pot-Pourri #10: Shrinking the database before taking a backup, 2008)

What can you do instead?

Solution 1:

After writing up a whole post against shrinking the data files, I can now say, it is okay to shrink your data files – but only if you use the truncate only option. Using TRUNCATEONLY will ensure that the data file is truncated (empty slots at the end of the data file is removed) but the pages are not moved to other slots, thus avoiding index fragmentation.

USE <<database_name>>

DBCC SHRINKFILE (N'<<database_filename>>’, <<target_size>>, TRUNCATEONLY)

Using the example above, the data file after shrinkfile with TRUNCATEONLY will do the following:

The original data file removes the tail of the data file and achieves the data file in the below diagram.

There are some rare scenarios when it might seem like a good idea to shrink the data files. One of these cases is if one of the drives on the server is full, large amounts of data has been purged from one of the databases AND it will never grow to be the full size ever again. This is a rare case and when in doubt, it would be recommended to avoid beyond all costs.

However, even in this case Paul Randal has an alternative solution to shrinking the data files in case large amounts of data has been purged and will never grow to that size again. Randal suggests creating a new filegroup, moving tables and indexes to a new filegroup and dropping the old filegroup. This avoids the need to shrink the data files and avoids the disadvantages of shrinking the data files. (Randal, Why you should not shrink your data files, 2009)

Solution 2:

Backing up the transaction log frees up a lot of space on the database server. This might not free up space from the data files but if the data file and log file are on the same drive, having a smaller transaction log file frees up space for the data files. Ensure that you regularly back up the transaction log file so that it never grows to a large size. (Tripp, 2005)

Solution 3:

It is not ideal to allow the auto-growth option to grow by percentage. Always set the auto-growth to grow by a specific size, as shown below.

In the case that auto-growth is set by 10% percentage, a file that is 100mb will grow to 110mb; a difference of 10mb. When that file becomes 2048mb it will grow to 2253mb; a growth of 205mb. When this file becomes 10gb, it will have a growth of 1gb. As you can see, the larger the file, the larger the growth, it is always safer to opt for the growth by a specific size.

References

Ozar, B. (2017, December 29). What’s So Bad About Shrinking Databases with DBCC SHRINKDATABASE? Retrieved from Brent Ozar: https://www.brentozar.com/archive/2017/12/whats-bad-shrinking-databases-dbcc-shrinkdatabase/

Randal, P. (2008, November 26). Conference Questions Pot-Pourri #10: Shrinking the database before taking a backup. Retrieved from SQL Skills: https://www.sqlskills.com/blogs/paul/conference-questions-pot-pourri-10-shrinking-the-database-before-taking-a-backup/

Randal, P. (2009, June 24). Why you should not shrink your data files. Retrieved from SQL Skills: https://www.sqlskills.com/blogs/paul/why-you-should-not-shrink-your-data-files/

sys.dm_db_index_physical_stats Query. (2011, March 29). Retrieved from SQL Server Central: https://www.sqlservercentral.com/Forums/Topic1086081-391-1.aspx

Tripp, K. (2005, June 25). 8 Steps to better Transaction Log throughput. Retrieved from SQL Skills: https://www.sqlskills.com/blogs/kimberly/8-steps-to-better-transaction-log-throughput/

Please stop asking me to shrink the data files

References

Related Posts

Find cells with value in Redshift tables

Find a column with a particular name from all of the tables in Redshift

MySQLDump options

How to design your SQL Server table indexes?

Output files in SQL Server

Leave a Reply Cancel reply

TheDataGirl

A little blog about big data and other things

Please stop asking me to shrink the data files

References

Related Posts

Find cells with value in Redshift tables

Find a column with a particular name from all of the tables in Redshift

MySQLDump options

How to design your SQL Server table indexes?

Output files in SQL Server

Leave a Reply Cancel reply