Design a partition structure to support a data warehouse

A table partition consists of a partition column, a partition function, and a partition scheme. The partition column is the column in the table that holds the values that are used as boundary values to group rows of data in a corresponding partition. Only one column can be defined as the partition column. SQL Server uses the partition column to determine which partition to store new and existing rows in. SQL Server also uses it for partition elimination to determine which partitions need to be read based on the WHERE clause or JOIN condition of a query.

The partition function specifies a set of range boundaries based on the values of the partition column. The boundaries can be defined as a RANGE LEFT or RANGE RIGHT. A boundary defined as RANGE LEFT means that the boundary value belongs to its left partition. The boundary value is the last value in the left partition. A boundary defined as RANGE RIGHT means that the boundary value belongs to its right partition. The boundary value in this case is the first value in the right partition.

The syntax for the range left partition function is as follows:
CREATE PARTITION FUNCTION MonthlyRangeLeft_pf (DATE)
AS RANGE LEFT FOR VALUES
(‘2016-01-31’, ‘2016-02-29’, ‘2016-03-31’);

The syntax for the range right partition function is as follows:
CREATE PARTITION FUNCTION MonthlyRangeRight_pf (DATE)
AS RANGE RIGHT FOR VALUES
(‘2016-01-01’, ‘2016-02-01’, ‘2016-03-01’);

Partitions created using Range Left and Range Right boundaries in the partition function















Also notice that only three boundary values were specified in the partition function but four partitions were created. The total number of partitions created is equal to N+1, where N is the number of boundary values provided in the partition function. The additional partition is an automatic “catch-all” partition to group values outside the left or right range boundary value.

The logical partitions created by the partition function need to be mapped to one or more
filegroups by the partition scheme. A partition can only be mapped to a single filegroup. A filegroup, on the other hand, can contain one or more partitions.

Map all four partitions to a single filegroup

CREATE PARTITION SCHEME SingleFilegroup_PS
AS PARTITION MonthlyRangeRight_PF
ALL TO ([PRIMARY]);

Partitions mapped to a single filegroup by the partition scheme















Notice that all partitions are mapped to a single filegroup named PRIMARY. When a database is created using default settings, a default filegroup named PRIMARY is created along with a primary data file. In this case, all system and user objects that are created in the database, such as tables and indexes, are assigned to the default filegroup containing the primary data file.

Additional filegroups and files can be created for a database. A filegroup can span across one or more files, but a file can only belong to a single filegroup. The primary data file’s name extension is typically .mdf. Secondary data file name extensions are typically .ndf.

In the following partition scheme examples, assume that four additional filegroups (FG1, FG2, FG3, FG4) have been created in the [Fact].[Sale] table by using the ALTER DATABASE statement.

Map two partitions per filegroup (even partition distribution)

CREATE PARTITION SCHEME TwoFilegroups_PS
AS PARTITION MonthlyRangeRight_PF
TO (FG1, FG1, FG2, FG2)













One of the benefits of partition functions and partition schemes is that they can be used in more than one table. You can create several fact tables with the same partition scheme if the fact table’s partition column matches the data type of the partition column used in the partition function.

When an index is created on a partitioned table, the index is partitioned by default using the same partition schema of the table, unless a partition schema is explicitly defined for the index. If the index inherits the base table’s partition schema, it is considered to be aligned with the table and is often referred to as a partition aligned index.

One of the major benefits of aligning a table and its indexes is the ability for SQL Server to switch partitions faster and more efficiently. Partition switching refers to the process of reassigning a partition from one partitioned table to another. The switching of partitions can occur in a matter of seconds because it is a metadata change and no data is physically moved. A typical use case for partition switching can be found in the incremental loading of the data warehouse.

You should use partitioning with care because the increase in the number of partitions can have performance implications due to higher demand on server resources, including memory, processor, and disk resources.

Comments

Popular Posts

Non-blocking, semi-blocking, and blocking transforms in SSIS

Implement additive, semi-additive, and non-additive measures

Implement an ETL solution that supports incremental data extraction