Determine appropriate use cases for a DQS Knowledge Base

Data Quality Services can be used for a wide variety of use cases to ensure that the data is suited for business usage. Data can be cleansed manually or computer-assisted by establishing matching and cleansing rules or by leveraging cloud-based services of reference data providers.

Incorrect or bad data is often a symptom of weak or lack of data validation rules during data entry. Other forms of inconsistent data can arise from data integration processes and data corruption. Bad, incorrect and inconsistent data create inefficiencies in the business processes and can lead to loss of revenue and increased operational costs. Bad data is often amplified during data analysis, reporting, data mining and data warehousing.

The main use cases for a DQS Knowledge Base involves the need for business users, information workers or IT professionals to create, maintain and execute data quality operations resulting from data incompleteness, inconsistency, inaccuracy, invalidity, duplication and nonconformity.

Data that is incomplete or incorrect can be cleansed using a DQS Knowledge Base by processing and analyzing how data conforms to the knowledge. The data steward can review, approve, reject or modify the computer-assisted process results. For example, incomplete data such as addresses can be enriched by reference data. Addresses missing or with invalid zip code values can be corrected with address validation services.

Fixing incorrect and non-standardized data is a common use case for DQS Knowledge Base. This is particularly important when integrating data from multiple sources. For example, different applications may store and display values for gender in multiple ways. Some of the most common values for gender may include ‘M’ for Male and ‘F’ for Female, others may store the word ‘Male’ or ‘Female’ or the values ‘0’ or ‘1’. These values can be automatically standardized by implementing term-based relationships or domain value synonyms in DQS. In addition, other rules can be put in place, such as null or blank values can be converted to ‘Unknown’ and values other than the ones expected above can be converted to ‘Other’.

In some cases, it can be hard to know all the different values that are expected for a domain. An example of this includes incorrect or alternate spellings of words. In this case, a Matching Policy can be created to programmatically correct misspelled domain values. Once a Matching Policy is defined, you can run a Data Quality Matching Activity Project with a new data set to match misspelled values based on the Matching Policy’s weight and similarity thresholds.

Comments

Popular Posts

Non-blocking, semi-blocking, and blocking transforms in SSIS

Implement additive, semi-additive, and non-additive measures

Implement an ETL solution that supports incremental data extraction