Posts

Showing posts from December, 2017

Perform knowledge discovery in DQS

Image
The Data Quality knowledge base can be populated using interactive or computer-assisted  processes. During the installation process of SQL Server 2016 Data Quality Services, you can  choose to install the Data Quality Client to interactively create and maintain a DQS knowledge  base. The DQS Client can be used to create data domains and add domain values manually or  by importing them from an Excel spreadsheet or a data cleansing project. In addition to interactively maintaining data through the DQS Client, you can maintain data  by running a computer-assisted activity known as Knowledge Discovery. The Knowledge Discovery  activity analyzes a sample of data that is used for data quality criteria. The algorithms  built into DQS look for data inconsistencies and syntax errors and then propose changes to the  data. You can then approve or reject the proposed changes or apply corrections manually. To perform a Knowledge Discovery activity for...

Perform domain management in DQS

Image
A knowledge base consists of domains. Each domain represents the data in a data field. Each  value in a data field or domain is known as a domain value. DQS provides the ability to validate,  cleanse, match and deduplicate values from any dataset against domain values in the DQS  Knowledge Base. Domains are created by performing a domain management activity. To create a DQS domain  for valid supplier names in the Suppliers Knowledge Base, follow these steps: Open the Data Quality Services client and connect to the DQS instance Click on Open Knowledge Base button from the Knowledge Base Management section Select Suppliers from the Knowledge Base list Click on Domain Management from the Select Activity list, then click Next Click on the Create A Domain icon Type the following information on the Create Domain window: Domain Name: SupplierName Description: List of valid Supplier Names Select String from the Data Type drop-down list Check the Use Leadi...

Determine appropriate use cases for a DQS Knowledge Base

Data Quality Services can be used for a wide variety of use cases to ensure that the data is suited  for business usage. Data can be cleansed manually or computer-assisted by establishing matching  and cleansing rules or by leveraging cloud-based services of reference data providers. Incorrect or bad data is often a symptom of weak or lack of data validation rules during data  entry. Other forms of inconsistent data can arise from data integration processes and data corruption.  Bad, incorrect and inconsistent data create inefficiencies in the business processes and  can lead to loss of revenue and increased operational costs. Bad data is often amplified during  data analysis, reporting, data mining and data warehousing. The main use cases for a DQS Knowledge Base involves the need for business users, information  workers or IT professionals to create, maintain and execute data quality operations  resulting from data incompleteness, incon...

Create a Data Quality Services (DQS) knowledge base

Image
Install DQS A Data Quality Knowledge Base (DQKB), is a repository of knowledge and metadata that describes  and defines the data domains and rules to cleanse and match data. The DQKB is used in  an organization to identify and correct invalid or incorrect data to ensure that data is consistent,  and trustworthy. A DQKB stores domain values that have been validated along with terms,  spelling errors, business rules, and reference data that can be used to perform data quality  cleansing and matching actions on the data. A Data Quality Services knowledge base can be created as a new and empty knowledge  base or by importing an existing DQS knowledge base from a DQS (.dqs) file. In our example,  we will be creating a Suppliers DQS Knowledge base. To create a new and empty DQS knowledge  base, follow these steps: Open the Data Quality Services client and connect to the DQS instance Click on New Knowledge Base button from the Knowledge Base ...

Install Data Quality Services

Image
The installation of DQS is a two-step process which consists of running the SQL Server 2016  Setup wizard and a command prompt based application called the Data Quality Server Installer.  To start the installation process, locate the SQL Server 2016 installation media and run  Setup.exe. This will open the SQL Server 2016 Installation Center. Click on Installation menu option  and then click on New SQL Server Stand-Alone Installation Or Add Features To An Existing  Installation to launch the SQL Server 2016 Setup wizard. Click Next. On the next window,  select the installation type option Perform A New Installation of SQL Server 2016, if SQL Server  2016 Database Engine has not been previously installed. In this case, you will also need to  install SQL Server 2016 Database Engine as it is required for Data Quality Services. If there is already an installation of SQL Server 2016 Database Engine and you would like to  add DQS to the exi...

Deploy SSIS packages by using the deployment utility, SQL Server, and file systems

Image
For project deployment, you  must ensure that all the related objects needed for the package execution are also part of the  package. For example, when a Connection Manager is created, it is specific for the package  within which it was created, unless it is promoted to a project-level Connection Manager by  right-clicking it and selecting Convert To Project Connection. Similarly, any parameter that you want to be included at the project level, such as, Server- Name, DatabaseName, FolderName, etc. can be created at the project level so it only needs to  be changed in one place and will be available for all the packages in the project. And once the  project has been deployed to a different server, for example to the Development, QA, or Production  server, the parameter value only needs to be changed in one place. The Project.params tab opened with project parameters, and  also under Connection Managers, you see the ...

Create an SSIS catalog

Image
SQL Server 2012 introduced a new paradigm in storing and deploying SSIS packages. Before  that release, all packages were individually managed and deployed. The SSIS DB catalog and SSISDB database serves as a centralized place where deployed SSIS  packages can be stored along with everything related to it, including: Project and package parameters Configuring runtime values for packages Executing and troubleshooting packages Managing Integration Services server operations You find the SSISDB catalog in the folder labeled Integration Services Catalogs. This is where  you find the Projects folder containing each project with all its packages. And the Environments  folder contains the package configuration and environment values, which can be set  before executing the packages. Each SQL Server instance gets only one SSISDB. You also find another SSISDB in the Databases folder in SQL Server (Figure 2-113)....

Error handling at package level in SSIS

Image
There are many causes of errors and failures when a package is running, either in Debug Mode  or in Production. One of the best and simplest ways to manage error handling is with the Event  Handler tab in the SSIS Visual Studio design environment. Under the Event Handler drop-down  list, you can see all of the event types that can be triggered to fire off a notification, email, or  insert the error details into a file, or a database table. From this tab, you can choose from any (or all) of the components within your whole package  to contain an event handler. Although each component can have its own custom event  handler, generally the most common error handling method is to put the error handling at  the highest level, the package, so that all of the errors float up and can be processed from one  centralized location. In every other respect, you create Event Handlers just like any package—with Control Flows  and ...

Implement error handling for data types in SSIS

Image
When data is being moved from a source to a destination, there you may experience different  types of issues related to the datatype errors and column data truncation. When column data  coming from the source is bigger than the target column it is going into, SSIS fails the package.  But the Error Output page in the For Data Flow tasks provides a way to tell SSIS how to handle  either errors or column truncation for each individual columns. The three options  are: Ignore Failure Continues processing on error or truncating and disregard any errors Redirect Row Sends the offending row out to the output Fail Component Fails the task when an error or truncation occurs If you want to just send the error or truncated records to an error file or table, you can  change the setting to Redirect Row. In this way, you can set up an error output destination. In the case of column truncation, you h...

Enable logging for package execution in SSIS

Image
Logging can be enabled and used at design time and runtime. To begin logging, in the Visual  Studio menu select SSIS > Logging to open the Configure SSIS Logs window. In  the Containers section, select the Control Flow objects for logging—the overall package itself  and/or each of the individual tasks within the package. Then in the Providers and Logs tab,  under the Add A New Log > Provider Type, select where you want the logging to be saved to,  and click the Add button: After adding each of the Provider Types selected you want to log to, go to the Select the  Logs to Use for the Container section to set up the Connection in the Configuration column.  In the case of a Text file, select a file to write to. For SQL Server DB as the target,  simply select the target database. You are not limited to selecting only one target location for  the logging, multiple destinations can be added and they will all be wri...

Fix performance, connectivity, execution, and failed logic issues by using the debugger in SSIS

Image
Checking your execution results progress tab When you are testing your packages, and troubleshooting issues, the details of the execution  results can be found in the last tab of the package design environment. A point of possible  confusion can be how SSIS changes the name of this one tab depending on if the package is  executing or the package execution has been stopped. The last tab is labeled as follows: Execution Results tab When the package is not executing you can see the results of  the last run. Progress tab When a package execution is in-progress you can see the real-time  execution results. Once the package execution has been stopped, you can still see these  results under the Execution Results tab. Package execution reports in Management Studio Once you have a package executing successfully, you want to monitor and track its progress.  SQL Server Management Studio (SSMS) provides a very detailed, color-coded package execution...