Sunday, November 8, 2009

SQL Server 2008 Compression Features

SQL Server 2005 introduced the concept of variable length fields to enable space conservation by using only the storage needed for the value even if the declared storage is more than that. SQL Server 2008 takes storage space saving not only a step further, but a giant leap ahead. This is especially important since databases in today’s times are reaching sizes of Tera bytes and beyond. I will be covering some high level concepts behind the compression features introduced in SQL 2008. For more detailed reading, you can follow the links provided at the bottom of the post.

Essentially, SQL 2008 provides 2 ways to compress the data:

  • Backup Compression
  • Data Compression
    • Row Compression
    • Page Compression

Backup compression enables shrinking the size of the backup files when they are written to the file system. Since almost all important databases have a backup strategy in place, this is a feature that should be applicable in a wide range of scenarios. It can be enabled either on an individual basis using an option in the Backup command or using the system SP sp_configure. Restoring would be the same regardless of whether the backup is compressed or not except the fact that older versions of SQL Server would not be able to read the compressed backups.

Data compression is a slightly more complex topic as compared to Backup compression. Let me first explain the 2 compression techniques in Data compression.

  • Row compression uses variable length storage technique for all data types, including fixed data types. For the application, it would still seem that the fixed types are implemented as such but under the hood, SQL Server stores them using variable storage. Row compression doesn’t impose a very big overhead on the system and should generally be enabled on all tables and indexes
  • Page compression is a slightly more complicated beast. First and foremost, when you enable Page compression, Row compression is enabled automatically. Additionally, Page compression uses something called Column prefix and Page dictionary to provide a greater degree of compression.
    • In Column prefix, SQL Server stores the common values at the beginning of values in a column in the CI(Compression Information) section and the actual cells will have pointers to the CI block
    • For Page dictionary, a 'dictionary' of repeating column values on the given page is created in the CI structure, and again, pointers to the value in the dictionary are left in the original column value location

Page compression involves additional overhead in case of querying as well as inserts and updates. So it must be enabled selectively. Generally, it should be enabled only for tables and indexes with very few updates and/or high number of scans.

SQL Server also provides some DMVs and system SPs to help developers/administrators to get a better understanding of the impact and possible gains from enabling compression. Some of these are:

  • sp_estimate_data_compression_savings - Allows you to estimate the savings you will be able to perceive for a given table/index
  • sys.dm_db_persisted_sku_features - To determine whether a database is using compression
  • data_compression_desc column in the catalog view sys.partitions - To determine what is compressed, and how (row or page)
  • sys.dm_db_index_operational_stats - To determine the number of Scans (S) and Updates (U) on a table

Some additional points about SQL Compression:

  • The data compression feature is available in the Enterprise and Developer editions of SQL Server 2008 only
  • It is recommended to compress one table, index, or partition at a time in most cases because compression requires additional temporary structures to be created which impose space and processing overheads
  • It is also recommended to compress smaller objects before larger ones so that the space freed up by the smaller ones can be utilised as temporary storage when compressing the larger objects.
  • After data compression has completed, the space saved is released to the respective data file(s). However, the space is not released to the file system, because the file size doesn’t reduce automatically as part of data compression. There are ways to release the space to the file system, including using DBCC SHRINKDATABASE and DBCC SHRINKFILE

As mentioned earlier, I tried to cover most of the high level concepts that one would need to understand for using compression. There are some excellent articles on MSDN that cover various implementation details and performance benchmarks regarding compression. I am providing a couple of links below, which you can use for reference.

Data Compression: Strategy, Capacity Planning and Best Practices

Katmai (Sql 2008) - Data Compression (including Backup Compression)