Design an appropriate storage solution, including hardware, disk, and file layout

The data warehouse storage’s performance can be measured by analyzing the time it takes for read and write operations to complete, known as latency, and by analyzing how much data volume it can read or write in each time frame, known as throughput

High latency and low throughput can be an indication of an improper selection for the storage infrastructure or incorrect hardware and software configurations.

Performance of a storage subsystem can be estimated based on several criteria:
  • Volume of data
  • Rate of change and growth of the data
  • Data retention policy
  • Number of concurrent users
  • Typical workloads expected
Two important metrics associated with storage are throughput and Input/Output Per Second (IOPS). Throughput is a measure of the amount of data that can be transferred in a period. This is typically measured in megabytes or gigabytes per second. IOPS is the number of I/O read or write operations that can complete in a second. 

It is important to distinguish IOPS metrics for random and sequential read and write operations to get a more precise picture. For instance, a storage subsystem with high throughput might not have high IOPS. A high throughput only determines how much data comes through, but not how rapidly it can be read or written.

The number of concurrent users executing queries and reports is also a key factor when choosing a storage subsystem. A proper storage solution should be able to support a typical user workload and be able to scale when demand for increased throughput and IOPS is required. A data warehouse where queries are subjected to extensive wait periods might be an indication of insufficient CPU and memory resources, or more importantly, a poorly designed storage solution.

Hardware
In most organizations, the storage subsystem SQL Server uses is based on either Direct Attached Storage (DAS) or on Storage-Area Network (SAN). Direct attached storage, as the name implies, consists of storage devices that are connected directly to the server and can only be accessed by the server it is attached to. DAS storage initially referred to disk drives inside the server enclosure, but later evolved to storage appliances that connect physically to the server via one or more cables and controllers.

Illustration of direct-attached storage in a local area network


















Typically, the servers connect directly to the DAS storage device through a host bus adapter (HBA) using SCSI, Fibre Channel, or SATA connectivity. Internal disks within the same server enclosure can also be considered DAS storage.

In contrast, a Storage-Area Network connects to a server via Fibre Channel (FC) or Internet Protocol (IP) via SAN switches and allows multiple servers to access shared pools of block storage across one or more storage devices. A SAN presents a pool of block storage as a drive to the server that can be accessed and managed just like a direct attached drive would.

Illustration of storage-area network













One or more SAN switches provide multiple paths to transfer data between the servers and the storage device. Multipathing provides a resilient storage architecture by allowing automatic failover to another path to get to and from the storage device in case an adapter or switch fails.

Typically, DAS is considered as a more economical storage subsystem because it does not require additional equipment like SAN switches do. SAN is considered a more scalable storage solution however, because it can manage larger pools of disks and can provide higher availability through its storage fabric.

Disk
Nowadays, a disk is considered simply a device that stores data either magnetically (hard disk drives), optically (CD and DVD drives), or through integrated circuits (solid state drives).

Solid state drives (SSD) offer the most throughput and IOPS compared to any other disk type. SSDs have no mechanical components, unlike hard disk drives (HDD) that use rotating platters and a moving arm to read data from the platter surfaces. SSDs have low read and write latency because data can be read directly from any location on the drive. Another important characteristic of SSDs is that read and write performance is not affected by fragmentation as HDDs are because there are no fragment seek times.

Data-tiering refers to the process of assigning a certain type of storage media to distinct categories of data. For example, data that is considered historical and that is accessed less frequently is usually stored in low-tier storage. This lower-tier storage usually consists of slow spinning HDDs. Data that is accessed more frequently is usually stored in higher-tier storage such SSDs.

Read and write throughput and IOPS can be scaled to a large degree by striping the read and write operations across many disks. This is also known as a Redundant Array of Independent Disks (RAID). A RAID combines multiple disks as a single logical disk for increased performance and fault tolerance. Different RAID levels provide better performance and fault tolerance benefits.

The RAID levels most commonly recommended for SQL Server are RAID 5 and RAID 10 (or 1+0). RAID 5 provides fault tolerance and more usable storage, but at a cost for each write operation. RAID 5 is commonly used in databases that experience more reads than writes, such as read-only reporting databases. RAID 5 can be performed with as little as three disks. Usable space is reduced to two-thirds of the total raw capacity of the drives and can sustain a single drive failure.

RAID 10 provides more fault tolerance and better read and write performance than RAID 5, but sacrifices more usable storage. Usable storage is reduced to half of the total raw capacity of the drives, but can sustain up to two drive failures as long as the drives are not in the same mirrored pair. RAID 10 provides a good balance of performance and redundancy.

File Layout
Data files are accessed using random read and write operations. Transaction log files are accessed using sequential operations. It is typically recommended to separate data and transaction log files into separate volumes to improve performance. The performance benefit comes from the segregation of workload patterns on the physical drives. In addition, SQL Server uses the tempdb system database for sorting, storing temporary objects, and many other activities. It is also recommended to place tempdb in a dedicated volume to support its unique workload.

Data files (.mdf and .ndf) are usually placed on one or more dedicated volumes with plenty of room for file growth. Disk drives allocated for data files need to support a high number of random write operations. Random write operations often cause heavy disk fragmentation and thus produce seek delays due to the constant repositioning of the disk head in the HDD. SSDs can reduce the read and write latency of random access operations because there are no moving disk read heads.

Log files are usually placed on a volume with dedicated drives that support high sequential write operations. Sequential write operations do not introduce as much fragmentation as random write operations. Both HDDs and SSDs can be used to place log files, although SSDs provide faster throughput and IOPS than HDDs.

Tempdb is considered a special type of database that is usually placed on a dedicated volume. Although tempdb is not considered a mission critical database, it plays a key role in SQL Server query performance. Some SQL Server environments rely heavily on tempdb and might experience sustained read and write operation demands. It is considered a best practice to place tempdb files in a dedicated volume to separate some of these unpredictably heavy I/O patterns from physical drives assigned to data and transaction log files.

Common file layout of SQL Server files and tempdb


Comments

Popular Posts

Non-blocking, semi-blocking, and blocking transforms in SSIS

Implement additive, semi-additive, and non-additive measures

Implement an ETL solution that supports incremental data extraction