Not all Data Flow transforms are created equal. In fact, some are much faster and require a lot less memory (non-blocking transforms). Others are slower while consuming much more memory (blocking transforms), and some are in between (semi-blocking transforms). We don’t always have a choice if we need to use a certain transform to do a certain specified task, but it is good to know the differences between these types of transforms so that we can better understand performance and resource utilization issues. Synchronous and asynchronous There is another related aspect about Data Flow transforms, which is their ability to quickly process a row as they are coming into the transform, independently of any other rows that came before or after (synchronous transforms). The other type of transform needs to be dependent on some or all of the rows that come before and after (asynchronous transforms). On the whole, non-blocking transforms a...
A frequent problem that you may experience when you bring data from multiple sources that need to be unified and consolidated and/or needs to be deduplicated. In the case of an enterprise maintaining multiple customer or product sales information in for various business entities, e.g. retail and web sales, or possibly different CRM programs that are a result of a company merger, all of this information has to be brought under one roof within the data warehouse. SQL Server provides two fuzzy transformations to help with such scenarios: Fuzzy grouping Fuzzy lookup These transformations can be used independently or in concert to assist with unification and deduplication of your data. Both the fuzzy transformation algorithms create and utilize temporary tables created within SQL Server. Fuzzy grouping Fuzzy grouping is used primarily for deduplicating and standardizing values in column data. Fuzzy grouping has input parameters a...
Determine attributes Dimension tables group related attributes that provide context to business processes. Attributes can be used to describe the “what,” “when,” “where,” “who,” and “how” for any given business process. Analyzing the entity relationship model (ERM) of line of business (LOB) applications Carrying out discovery sessions with business users and subject matter experts Review existing reports and dashboards Analyze forms and instruments that are used to track certain business processes The next step is to group related attributes into dimension tables . Grouping related attributes into dimensions facilitates data filtering, slicing, and dicing. Implement dimensions After you determine attributes and group them into dimensions, you can define the dimension tables in your data warehouse. Each attribute becomes a column in the dimension table and holds specific data type values. Four main types of dimension co...
Comments
Post a Comment