Select appropriate indexes

A query is considered selective if the execution plan utilizes a more efficient operation such as an index seek rather than a table or index scan. SQL Server uses an internal cost-based optimizer known as the SQL Query Optimizer to determine the execution plan with the lowest cost while also taking in consideration the cost of finding candidate plans. The execution plan operation chosen by the SQL Query Optimizer depends not just on its WHERE clause predicate but also on other factors such indexes and column statistics.

Examples of low selectivity queries include queries that aggregate sales data by year or queries that average profit across products. Performance tuning of these non-selective queries can be challenging using traditional rowstore indexes. For this matter, a columnstore index is best suited because they are in-memory structures and outperform disk-based rowstore index scans.

For data warehouse workloads, a common index design approach is to implement a mix of rowstore and columnstore indexes.

Rowstore indexes in fact tables are used to cover highly selective queries. Their goal is to ensure query plans use index seek operations. Columnstore indexes in fact tables are used to improve query performance for non-selective queries that result in an index scan operation.

ETL executions and OLAP cube processing, also benefit from a mixed rowstore and columnstore index design approach.

During the design phase, certain indexes can be identified and designed to provide a best effort query performance. This initial effort is considered best effort because it is purely based on observations from the table relationships and business rule enforcements in the database.

The three main types of database rules that are usually designed in a database before data is loaded include uniqueness constraints, primary key relationships, and foreign key relationships. A UNIQUE constraint automatically creates a nonclustered index on the table, while a PRIMARY KEY constraint automatically creates a clustered index on the table. A FOREIGN KEY constraint does not automatically create an index on the table. Typically, columns that reference a FOREIGN KEY constraint are good candidates for an index.

PRIMARY KEY and UNIQUE constraints are common constraints defined for both dimension and fact tables. In dimension tables, the surrogate key is usually defined with a PRIMARY KEY constraint.

An index is not automatically generated when you define a FOREIGN KEY constraint, but it is always recommended to manually create a nonclustered index on all foreign keys.

General guidelines for index selection










INDEX argument syntax for rowstore and columnstore index definition in the CREATE TABLE statement









CREATE INDEX statement syntax for rowstore and columnstore index definition









The unsupported clustered columnstore index data types include:
  • ntext, text, and image
  • nvarchar(max), varchar(max), and varbinary(max) (Applies to SQL Server 2016 and prior versions, and nonclustered columnstore indexes)
  • rowversion (and timestamp)
  • sql_variant
  • CLR types (hierarchyid and spatial types)
  • Xml

A filtered nonclustered index can be extremely powerful when queries are typically executed with a WHERE clause predicate. For example, a common filter to use when querying dimension columns is to filter out null values or values where a certain flag is set to TRUE. This is particularly helpful for slowly-changing dimensions.

A primary key constraint is permissible in a table along with a clustered columnstore index as long as the primary key contraint is defined as nonclustered.

You need to constantly evaluate the need for new indexes, eliminate unused indexes, and find opportunities to consolidate indexes. The goal of the indexing solution is to improve performance while maintaining a balance of the number of indexes and corresponding storage space required.

Comments

Popular Posts

Non-blocking, semi-blocking, and blocking transforms in SSIS

Implement additive, semi-additive, and non-additive measures

Implement an ETL solution that supports incremental data extraction