- 1. IMPORTANT FOR DATA MASKING SSIS FEATURES
- 1.1 Sources
- 1.2 Components
- 1.3 Audit
- 1.4 Destinations
- 1.5 Error Handling
- 1.6 Log
- 2. USEFUL SITES AND BLOGS
The data masking in the context of SSIS is mainly a**bout Data Flow of the Data Flow task and associated toolbox components.
Control Flow elements augment the overall workflow of the lifecycle.
Sources in SSIS are components that allow to connect to a data source, be it a database, a file, or a service in order to retrieve data from the entity.
Native source components allow integration of any flat (delimited), excel, xml and native to SSIS format (what SSIS calls "raw") files.
Hadoop as a special file type is supported starting with SQL Server 2016
SSIS connect to any database that support the following drivers: ODBC, OLE DB, ADO.NET, ADO, and SQL Server Native Driver. Relational Data Management Systems with such connectivity include well known vendors ( Oracle's Oracle and MySQL databases, Microsoft's SQL Server, Access, and SQL Server in Azure, Sybase, IBM's database varieties, and not so well known such as Postgres, Ingres, SQL Lite, etc.)
You can integrate with SaaS solutions in two ways. One of them consuming service into web service task and outputting into file for further processing. Another way is to create a script component in the dataflow with the following output ( the example to follow)
In addition to masking components outlined in the wiki, the native SSIS components most used with the data masking include the following:
Derived column with its powerful expression engine is used to complement the rules of masking components.
The example would be taking just a year part out of the date element or concatenating first and last name.
Here is another example of using Derived Column to standardize the letter case of the column.
Let's look at how we can convert the entries in the column to be either all letters with upper case or lower case. At first, we will prepare our data. We will configure a source file (Flat File Source) filled with strings in different letter case.
1.1 Open flat file source settings (two click on the icon).
1.2 Create flat file connection manager.
1.3 Select the local source file (txt).
2.4. Press "OK"
Content of source file you need to write manually with proper structure, for example:
When Source file has been configured, you need to connect it with Derived column component.
Then you need to configure the Derived Column component.
2.1. Open the derived column component settings by double clicking on it.
2.2. Enter the name of the Column.
2.3. Select "Replace" value from the drop-down "Derived Column" (an existing column will accept new data from a transformation algorithm).
2.4. Configure "Expression" column according to needed function ( Example: UPPER(FirstName) ).
To convert letters into Upper Case you need to select "UPPER(String)" function (see in green on the screenshot).
Upper case was often used in mainframe applications, and often remain a legacy.
To convert letters into Lower Case you need to select "LOWER(String)" function (see in blue on the screenshot). Lower case has been often used as informal standard in Linux based apps, for all the metadata description, for example or can standardize some element such as colors, etc. in the database.
To convert letters into Proper case you need to configure another component named "Script Component".
2.5.1 Add script component to data flow.
2.5.2 Connect component to source with blue arrow
2.5.3 Open settings (two clicks on the icon)
2.5.4 Navigate to tab "Input Columns" and select needed cell from existing columns.
2.5.5 Navigate back to the "Script" tab and press "Edit Script..." button.
2.5.6 Scroll down to search "Input0_ProcessInputRow" function in C# code.
2.5.7 Write code with replace "FirstName" to yours column name.
2.5.8 Save changes and close Visual Studio edit window.
2.6. Press "OK"
3.Configure destination file exactly the same as the source file.
4.Finally you should get a similar result
5.Run package and you can see result in destination file.
It could be used in unconventional cases when SSN is stored as an integer for performance reasons in the database
From simple cases of converting from lower to upper case and to complex converting from Hiragana to Katakana.
Conditional split along with multicast and multiple unions and joins allows processing of complicated flows.
Percentage sampling and row sampling are two components that help select just a random subset of all the rows in the table and especially files that are hard to subset with the SQL statement.
Audit and rowcount components allow for packages audit.
As SSIS provides for the variety of sources, it also provides for a variety of destinations. Destinations in SSIS are components that allow to connect to a data source, be it a database, a file, or a service in order to populate data into the entity.
Flat files, excel files and native to SSIS raw files are typical file destinations.
As a special file type, starting with SQL Server 2016 SSIS supports Hadoop both on premises and in Azure.
Just as there are sources connecting to databases to retrieve data, there are also destinations connecting to databases to populate data. They mirror the source and add more components in different versions of SQL Server.
Destinations in all versions of SQL server encompass ODBC, OLE DB, ADO.NET, ADO, and SQL Server and SQL Server Compact Native Driver. Relational Data Management Systems with such connectivity include well known vendors (Oracle's Oracle and MySQL databases, Microsoft's SQL Server, Access, and SQL Server in Azure, Sybase, IBM's database varieties, and not so well known such as Postgres, Ingres, SQL Lite, etc.)
Starting with 2016 version of SQL Server it adds Data Streaming Destination for easier processing of tabular data with PowerBI and supports Azure blob and other destinations.
SSIS allows to create variables and parameters of different data types to hold values used in between tasks, environmental values, and values used in dynamically created statements. Variable and parameters differ in their scope definition and corresponding design and run-time use.
SSIS 2012 introduced the notion of the "project-level scope" into the development of SSIS solutions. With this notion, it enabled the developers to use project scope variables called "parameters", share connection managers across packages and deploy the whole project as a whole.
|Parameters differ from variables. They can be either project level or package level in scope.|
Project level variables can hold value between execution of different packages or during an execution of the same package - and this value cannot change during package execution. They are also often used in run-time to hold values pertaining to specific environment. The value could be set either at development time, or during the run time and are very useful in deployments.
These configurations allow changing parameters values based on different environments.
Also, very useful feature of the parameter is its ability to pass value from the parent to the child package.
Parameters, however, allow less data types than variables. Parameters do not support the following data types: Char, DBNull, Object.
In 2008, there are only variables. The SSIS 2008 does not have project model, and as such variables are the only way to introduce temporary storage. Starting with 2012, SSIS introduces the project model used in deployment and for shared connections, and parameters start playing a role of project or package level variable. Variables still exist and can be scoped at the package, container, task or event handler level.
The variables are used in the scenarios when their value should change within the scope of the package, container, task execution or event change. An example of their use is in loops, conditional or dynamic expressions.
Here are the examples of variable declarations:
Variables window may be opened many ways, with the most common two being right-clicking in the control-flow or data-flow area and choosing "Variables" or by choosing "View" -."Other Windows" ->"Variables" from the Visual Studio Menu pane.
Variables can be of many different data types, among the integers, string, datetime, object, DBNull and more.
Usually, in defining variable one would provide the scope followed by double colon and variable name User::variable_name
Each package be default has system variables defined and defaulted to the package attribute value:
You can see some good examples of them when you create and open "Derived Column Transformation Editor". Other components and tasks that make a good use of the variables are all the transformation editor components, as well as components using expression builder, such as "Expression Task" component, "For Loop Container" and the rest of the loops, in Data Flow tab it would be "Conditional Split", "Derived Column", and any component which attributes can be assigned dynamically and as such use Property Expression Editor. Such are connections, "FTP Task", or "Send Mail Task".
Every component in SSIS provides for error handling in three modes: either it allows to re-direct error message, ignore error or fail the component execution.
SSIS allows to log a variety of events and use them later for analysis.
Help from Microsoft:
New blog with Jimmy Wong
Old blog from Matt Masson