Not all Data Flow transforms are created equal. In fact, some are much faster and require a lot less memory (non-blocking transforms). Others are slower while consuming much more memory (blocking transforms), and some are in between (semi-blocking transforms). We don’t always have a choice if we need to use a certain transform to do a certain specified task, but it is good to know the differences between these types of transforms so that we can better understand performance and resource utilization issues. Synchronous and asynchronous There is another related aspect about Data Flow transforms, which is their ability to quickly process a row as they are coming into the transform, independently of any other rows that came before or after (synchronous transforms). The other type of transform needs to be dependent on some or all of the rows that come before and after (asynchronous transforms). On the whole, non-blocking transforms a...
Additive Additive measures are numeric values that can be summed up by all the dimensions in the fact table, for example the Sales Amount. These are the most common measures and they are the easiest to understand and implement. Semi-additive Semi-additive measures are numeric values that can be summed up by some dimensions in the fact table, but not all. In most cases, they can be aggregated by any other dimension except time. An example of a semi-additive measure is Account Balance. You can determine the ending balance of an account at the end of a month, but adding up these monthly account balances do not make sense. Semi-additive measures are usually found in fact tables that store periodic snapshots of data. Non-additive Non-additive measures, as the name implies, are measures that cannot be summed up. These measures include percentages such as Discount Percent, ratios such as Profit Margin, and averages ...
Design fact table patterns Fact tables used in data warehouses can become the very large, very fast by order of magnitude. It’s important to implement best practices for fact table design at the onset, even if your fact table does not contain billions of rows. Here are some things to be aware and include in your fact table design: Consider partitioning your fact tables when they are larger than 100 GB. The partitions can be by any column, which provides a fairly equal distribution of your data. Usually the Date columns (not DateTime columns) are good candidates. This provides more granularity when creating indexes, loading, updating, and deleting data (by partitions) Faster data access for queries by limiting the IO to specific partitions Utilize partition switching to move data in and out almost instantaneously Implement sliding window using partition splitting and partition merging Build clustered indexes on the Date Key of the fact tables (not...
Comments
Post a Comment