Storage Allocation and Management for Memory-Optimized Tables
As described in implementing durability for memory optimized tables, the storage requirements and its management for memory optimized tables is very different compared to the disk-based tables. If you are migrating a subset of your disk-based tables to memory-optimized tables or are just developing a new application with memory-optimized tables, you will notice these differences. Unlike disk-based tables in SQL Server that use 8k pages to store data in the same format both on disk and in memory, the memory-optimized tables utilize a different size and format both in-memory and on disk. This has been a point of concern by many customers as they have observed, sometimes, disproportionate storage consumed by memory optimized tables. The goal of this blog is to help alleviate this confusion by describing how storage is managed over the lifetime of data rows.
The data for memory optimized tables is stored in one or more data/delta file pairs (also referred to as checkpoint file pairs or CFP) with data file(s) storing inserted rows and delta file(s) referencing deleted rows. During the execution of an OLTP workload, as the DML operations update, insert, and delete rows, new data/delta files are created to persist the data. Also, existing delta files are updated to process delete of existing rows. Over time, if the number of active rows, after accounting for deleted rows, in two or more consecutive CFPs falls below a threshold (usually < 50%) such that they can be merged into one CFP of 128 MB, they are merged automatically by a background merge process into a new CFP. Once the merge operation is complete, the older CFPs go through a transition phase and are eventually removed (i.e. garbage collected) from the storage. Note, SQL Server 2014 CTP2 supports up to 4096 CFPs within a database but this limit will be changed to 8192 CFPs in RTM
At any given time, the data/delta file pairs are in one of the following 4 categories
- Pre-allocated CFPs (data and delta): A small set of CFPs are kept pre-allocated to minimize or eliminate any waits to allocate new files as transactions are being executed. These are full sized with data file size of 128MB and delta file size of 8MB but contain no data. The number of CFPs is computed as the number of logical processors or schedulers with a minimum of 8. This is a fixed storage overhead in databases with memory-optimized tables
- Active CFPs: These contain the inserted/deleted rows for the last ‘durable’ checkpoint. These CFPs contain all inserted/deleted rows required before applying the active part of the transaction log at the database restart. We expect that the combined size of the ACTIVE CFPs to be at most 2x of the in-memory size of memory-optimized tables. However, there are cases, specifically with data files > 128MB which can happen due to large concurrent transactions or if the merge operation falls behind, when we may exceed this 2x limit. To simplify the discussion here, let us just assume all data files are 128MB and that the merge/checkpoint operations are keeping up.
- CFPs required for Operational Correctness: These represents files that were a source of MERGE operations where rows from these files were moved to a new data/delta files. These files (i.e. CFPs) are not needed for database restart. However, these CFPs can’t be de-allocated or marked for deallocation (i.e. Garbage collection) because they are needed if SQL Server needs to recover from a previous durable checkpoint. A CFP can be marked for garbage collection once the log truncation point moves beyond its transaction range. This requires a combination of checkpoint operation and transaction log backups. Transaction log backups are not needed if the database has been configured in simple recovery model.
- CFPs that can be removed from storage: These are garbage collected similar to how file-stream files are garbage collected. Please refer to FS Garbage Collection for details
Transitioning CFPs out of category-3 and category-4 can take up to 5 checkpoints and transaction log backup steps, if not running in simple recovery mode. For in-memory databases, the automatic checkpoint is taken when the size of transaction log exceeds 512MB since the last checkpoint. You can, of course, manually force the checkpoint followed by log backup to expedite the garbage collection but then this will add 5 empty CFPs (i.e. 5 data/delta file pairs with data file of size 128MB each). Typically, in test environment, you may have to force transaction log backups and checkpoints to remove the CFPs that are not needed. In production scenarios, however, we don’t expect customers to do manual checkpoint but to rely on the automatic checkpoints and log backups that are taken as part of backup strategy. The impact of this garbage collection process is that in-memory databases may have a disproportionate storage footprint compared to its size in memory.
To clarify the point further, let us walk through some examples of in-memory databases under various scenarios and compare their size in-memory and on storage. For these examples, we will assume we are running a SQL instance with 8 logical processor with 1 memory optimized table with following schema. Note, this has a row size of approximately 8KB.
CREATE TABLE dbo.t_memopt (
c1 int NOT NULL,
c2 char(40) NOT NULL,
c3 char(8000) NOT NULL,
CONSTRAINT [pk_t_memopt_c1] PRIMARY KEY NONCLUSTERED HASH (c1) WITH
(BUCKET_COUNT = 100000)
) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA)
As indicated earlier, the fixed storage overhead for this configuration will be 1 GB (i.e. 8 data files of 128MB each).
-
Example-1:
- DML Operation: Insert 1 row
- In-memory Size – you can get the size by querying sys.dm_db_xtp_table_memory_stats. The data row is taking 8KB while hash index is taking 1MB for allocating 100K buckets
Name memory_used_by_indexes_kb memory_used_by_table_kb
———- ————————- ———————–
t1 1024 8
-
- Storage for memory-optimized table – The storage is more interesting. On the machine with 8 logical processors, there are 8 data and 8 delta files pre-allocated and there is 1 data/delta file pair used to store the currently active rows. So overall storage for memory optimized FG is (9*128MB) = 1.12 GB which is disproportionately larger than the size of in-memory table. This is expected because my configuration has a fixed overhead of 1 GB plus the granularity of storage allocation is 128MB. This is different than disk-based tables where unit of storage allocation is an 8K page for tables < 64k in size. This additional size of storage for memory-optimized tables may surprise you if you are just playing with small data sets. However, we don’t expect this to be a cause of concern as typical customer workload will have much larger dataset (e.g. 10s or 100s of GB) stored in memory-optimized tables.
-
Example-2
- Operation: Insert 1-million rows
- In-memory Size – 7.8GB
- Storage for memory-optimized table – 8.9GB which is pretty reasonable compared to previous example. 1 million rows should take approx. 8GB of storage plus 1GB of fixed overhead which is now around 10%.
- Storage for disk-based table – it is around 8GB so it is of the same order as memory optimized tables.
-
Example-3
- Operation: update non-key values for the first 500K rows of the table loaded in the previous example.
- In-memory Size (used) – 8 GB. When I measured this, most rows had been garbage collected
- Storage for memory-optimized table – 13 GB. You may wonder why the size of storage has gone up for update operation. An update operation on memory-optimized table is done as delete followed by an insert. For the deleted rows, the delta file(s) are updated to reflect them and then the new row versions are inserted into another data file. So at this time, both old and new version of the data row are persisted and that is why you see that the storage consumption has gone up approximately 50%. The data/delta files will get merged eventually and get garbage collected as described earlier.
- Storage for disk-based tables – it stays around 8GB. This is expected because the update of the non-key columns is done in-place for disk-based tables.
-
Example-4
- Workload/Operations – Update only OLTP workload running against a database of size 150GB for over 24 hours to measure storage and memory in the steady state
- In-memory size (used) – 167 GB. This shows that the stale row versions are getting garbage collected. Please refer to Memory Management for Memory-Optimized Tables for details.
- Storage for memory-optimized table –320 GB. Given that the size of the database is 150GB, the storage in steady state is around 2.2 times.
Though these examples were over simplified but they do illustrate the difference in storage provisioning for memory optimized tables over disk-based tables. Your storage requirement will depend upon following key factors such
-
- size of your durable memory optimized tables
- DML operations on the durable tables. For example, if your workload is update heavy, it leads to MERGEs thereby inflating the storage requirement until these files get garbage collected
- Frequency of checkpoint and log backups.