Perform domain management in DQS

A knowledge base consists of domains. Each domain represents the data in a data field. Each value in a data field or domain is known as a domain value. DQS provides the ability to validate, cleanse, match and deduplicate values from any dataset against domain values in the DQS Knowledge Base.

Domains are created by performing a domain management activity. To create a DQS domain for valid supplier names in the Suppliers Knowledge Base, follow these steps:
  • Open the Data Quality Services client and connect to the DQS instance
  • Click on Open Knowledge Base button from the Knowledge Base Management section
  • Select Suppliers from the Knowledge Base list
  • Click on Domain Management from the Select Activity list, then click Next
  • Click on the Create A Domain icon
  • Type the following information on the Create Domain window:
    • Domain Name: SupplierName
    • Description: List of valid Supplier Names
    • Select String from the Data Type drop-down list
  • Check the Use Leading Values option
  • Check the Normalize String option
  • Select None from the Format Output To option
  • Select English from the Language drop-down list
  • Check the Enable Speller option
  • Un-check the Disable Syntax Error Algorithms option, then Click OK
















In the Domain Management activity, you can set field-wide properties, create rules, configure reference data services, or setup term-based or cross-field relationships. The domain properties shown above can be modified under the Domain Properties tab.

In the Reference Data tab of the Domain Management activity window, you add service providers from a Microsoft DataMarket subscription to standardize, correct, cleanse and enrich data. Unfortunately, the Microsoft DataMarket was retired on March 31, 2017 and you can no longer configure this service. At the time of this writing, no alternative service is available.

In the Domain Rules tab of the Domain Management activity window, you add rules that the domain values must pass in order to be deemed correct. For example, you can set up a rule the validates that the length of the domain value should be greater than or equal to five characters. All new domain values will be checked against this rule. If a domain value is added with less than five characters, the value will be marked as Invalid.

In the Domain Values tab of the Domain Management activity window, you add, import, edit or delete domain values. Values can be added manually or imported from Excel. You can also change the status of domain values as Correct, Invalid or Error and provide a replacement value for domain values with an Invalid or Error status.

Synonyms can also be defined in the Domain Values tab by selecting two or more values and selecting Set As Synonyms option from the right-click context menu or Set selected Domain Values As Synonyms icon from the Domain Values menu options. You can set one of the synonym values as the Leading value. The leading value is used by DQS to replace synonym values found in the data during cleansing and matching activities.

In the Term-Based Relations tab of the Domain Management activity window, you add corrections to a term that is part of a domain value. Term-based relations are created by building a list of Value/Correct To pairs. The Value is searched within a domain value and replaced with the Correct To value if a match is found. For example, you can add a term-based relation that replaces the term Inc. to Incorporated. In this case, the domain value Contoso, Inc. will be changed to Conto, Incorporated.

The SupplierName is considered a single domain as it only relies on itself to represent a data field. Composite domains, rely on two or more single domains to represent data in a field satisfactorily. For example, a Geography domain may require multiple single domains including a City, State and Country domains.

To create a composite domain, such as the Geography domain, the single domains need to
be created first, Once the single domains are created the composite domain can be defined by performing a Domain Management activity as describe in the following steps:





  • Click on the Create A Composite Domain icon from the Domain Management top menu bar
  • In the Create a Composite Domain window, type Geography in the Composite Domain Name field
  • Optionally, type a brief description in the Description field
  • Select the Country, State and City domains from the Domains List and click on the right arrow to move the domains to the Domains in Composite Domain list. Reorder if necessary and click OK













Comments

Popular Posts

Non-blocking, semi-blocking, and blocking transforms in SSIS

Implement additive, semi-additive, and non-additive measures

Design an appropriate storage solution, including hardware, disk, and file layout