Posts

Perform knowledge discovery in DQS

Image
The Data Quality knowledge base can be populated using interactive or computer-assisted  processes. During the installation process of SQL Server 2016 Data Quality Services, you can  choose to install the Data Quality Client to interactively create and maintain a DQS knowledge  base. The DQS Client can be used to create data domains and add domain values manually or  by importing them from an Excel spreadsheet or a data cleansing project. In addition to interactively maintaining data through the DQS Client, you can maintain data  by running a computer-assisted activity known as Knowledge Discovery. The Knowledge Discovery  activity analyzes a sample of data that is used for data quality criteria. The algorithms  built into DQS look for data inconsistencies and syntax errors and then propose changes to the  data. You can then approve or reject the proposed changes or apply corrections manually. To perform a Knowledge Discovery activity for the SuppliersName in the Suppliers Kno

Perform domain management in DQS

Image
A knowledge base consists of domains. Each domain represents the data in a data field. Each  value in a data field or domain is known as a domain value. DQS provides the ability to validate,  cleanse, match and deduplicate values from any dataset against domain values in the DQS  Knowledge Base. Domains are created by performing a domain management activity. To create a DQS domain  for valid supplier names in the Suppliers Knowledge Base, follow these steps: Open the Data Quality Services client and connect to the DQS instance Click on Open Knowledge Base button from the Knowledge Base Management section Select Suppliers from the Knowledge Base list Click on Domain Management from the Select Activity list, then click Next Click on the Create A Domain icon Type the following information on the Create Domain window: Domain Name: SupplierName Description: List of valid Supplier Names Select String from the Data Type drop-down list Check the Use Leading Values option C

Determine appropriate use cases for a DQS Knowledge Base

Data Quality Services can be used for a wide variety of use cases to ensure that the data is suited  for business usage. Data can be cleansed manually or computer-assisted by establishing matching  and cleansing rules or by leveraging cloud-based services of reference data providers. Incorrect or bad data is often a symptom of weak or lack of data validation rules during data  entry. Other forms of inconsistent data can arise from data integration processes and data corruption.  Bad, incorrect and inconsistent data create inefficiencies in the business processes and  can lead to loss of revenue and increased operational costs. Bad data is often amplified during  data analysis, reporting, data mining and data warehousing. The main use cases for a DQS Knowledge Base involves the need for business users, information  workers or IT professionals to create, maintain and execute data quality operations  resulting from data incompleteness, inconsistency, inaccuracy, invalidity, dupli

Create a Data Quality Services (DQS) knowledge base

Image
Install DQS A Data Quality Knowledge Base (DQKB), is a repository of knowledge and metadata that describes  and defines the data domains and rules to cleanse and match data. The DQKB is used in  an organization to identify and correct invalid or incorrect data to ensure that data is consistent,  and trustworthy. A DQKB stores domain values that have been validated along with terms,  spelling errors, business rules, and reference data that can be used to perform data quality  cleansing and matching actions on the data. A Data Quality Services knowledge base can be created as a new and empty knowledge  base or by importing an existing DQS knowledge base from a DQS (.dqs) file. In our example,  we will be creating a Suppliers DQS Knowledge base. To create a new and empty DQS knowledge  base, follow these steps: Open the Data Quality Services client and connect to the DQS instance Click on New Knowledge Base button from the Knowledge Base Management section Type the following

Install Data Quality Services

Image
The installation of DQS is a two-step process which consists of running the SQL Server 2016  Setup wizard and a command prompt based application called the Data Quality Server Installer.  To start the installation process, locate the SQL Server 2016 installation media and run  Setup.exe. This will open the SQL Server 2016 Installation Center. Click on Installation menu option  and then click on New SQL Server Stand-Alone Installation Or Add Features To An Existing  Installation to launch the SQL Server 2016 Setup wizard. Click Next. On the next window,  select the installation type option Perform A New Installation of SQL Server 2016, if SQL Server  2016 Database Engine has not been previously installed. In this case, you will also need to  install SQL Server 2016 Database Engine as it is required for Data Quality Services. If there is already an installation of SQL Server 2016 Database Engine and you would like to  add DQS to the existing instance, select Add Features To An Ex

Deploy SSIS packages by using the deployment utility, SQL Server, and file systems

Image
For project deployment, you  must ensure that all the related objects needed for the package execution are also part of the  package. For example, when a Connection Manager is created, it is specific for the package  within which it was created, unless it is promoted to a project-level Connection Manager by  right-clicking it and selecting Convert To Project Connection. Similarly, any parameter that you want to be included at the project level, such as, Server- Name, DatabaseName, FolderName, etc. can be created at the project level so it only needs to  be changed in one place and will be available for all the packages in the project. And once the  project has been deployed to a different server, for example to the Development, QA, or Production  server, the parameter value only needs to be changed in one place. The Project.params tab opened with project parameters, and  also under Connection Managers, you see the project level connections.

Create an SSIS catalog

Image
SQL Server 2012 introduced a new paradigm in storing and deploying SSIS packages. Before  that release, all packages were individually managed and deployed. The SSIS DB catalog and SSISDB database serves as a centralized place where deployed SSIS  packages can be stored along with everything related to it, including: Project and package parameters Configuring runtime values for packages Executing and troubleshooting packages Managing Integration Services server operations You find the SSISDB catalog in the folder labeled Integration Services Catalogs. This is where  you find the Projects folder containing each project with all its packages. And the Environments  folder contains the package configuration and environment values, which can be set  before executing the packages. Each SQL Server instance gets only one SSISDB. You also find another SSISDB in the Databases folder in SQL Server (Figure 2-113). This is essentially  a