e
q
u
e
s
t
a
d
e
m
o < back
SSIS HELP
{TOC}
IMPORTANT FOR DATA MASKING SSIS FEATURES
The data masking in the context of SSIS is mainly about Data Flow of the Data Flow task and associated toolbox components.
Control Flow elements augment the overall workflow of the lifecycle.
Sources
Sources in SSIS are components that allow to connect to a data source, be it a database, a file, or a service in order to retrieve data from the entity.
Files
Native source components allow integration of any flat (delimited), excel, xml and native to SSIS format (what SSIS calls "raw") files.
Hadoop as a special file type is supported starting with SQL Server 2016
Databases
SSIS connect to any database that support the following drivers: ODBC, OLE DB, ADO.NET, ADO, and SQL Server Native Driver. Relational Data Management Systems with such connectivity include well known vendors ( Oracle's Oracle and MySQL databases, Microsoft's SQL Server, Access, and SQL Server in Azure, Sybase, IBM's database varieties, and not so well known such as Postgres, Ingres, SQL Lite, etc.)
Services
You can integrate with SaaS solutions in two ways. One of them consuming service into web service task and outputting into file for further processing. Another way is to create a script component in the dataflow with the following output ( the example to follow)
Components
In addition to masking components outlined in the wiki, the native SSIS components most used with the data masking include the following:
Complementing transformation
Derived column
Derived column with its powerful expression engine is used to complement the rules of masking components.
The example would be taking just a year part out of the date element or concatenating first and last name.
Data conversion component
It could be used in unconventional cases when SSN is stored as an integer for performance reasons in the database
Character mapping
From simple cases of converting from lower to upper case and to complex converting from Hiragana to Katakana.
Language flow constructs
Conditional split along with multicast and multiple unions and joins allows processing of complicated flows.
Sub-setting components
Percentage sampling and row sampling are two components that help select just a random subset of all the rows in the table and especially files that are hard to subset with the SQL statement.
Audit
Audit and rowcount components allow for packages audit.
Destinations
As SSIS provides for the variety of sources, it also provides for a variety of destinations. Destinations in SSIS are components that allow to connect to a data source, be it a database, a file, or a service in order to populate data into the entity.
Files
Flat files, excel files and native to SSIS raw files are typical file destinations.
As a special file type, starting with SQL Server 2016 SSIS supports Hadoop both on premises and in Azure.
Databases
Just as there are sources connecting to databases to retrieve data, there are also destinations connecting to databases to populate data. They mirror the source and add more components in different versions of SQL Server.
Destinations in all versions of SQL server encompass ODBC, OLE DB, ADO.NET, ADO, and SQL Server and SQL Server Compact Native Driver. Relational Data Management Systems with such connectivity include well known vendors (Oracle's Oracle and MySQL databases, Microsoft's SQL Server, Access, and SQL Server in Azure, Sybase, IBM's database varieties, and not so well known such as Postgres, Ingres, SQL Lite, etc.)
Starting with 2016 version of SQL Server it adds Data Streaming Destination for easier processing of tabular data with PowerBI and supports Azure blob and other destinations.
Services
Letter Case Transformation
SSIS Allows letter case transformations with special component named "Derived Column".
At first you need to configure source file (Flat File Source) with strings in source letter case (lower case).
1.1 Open flat file source settings (two click on the icon).
1.2 Create flat file connection manager.
1.3 Select the local source file (txt).
2.4. Press "OK"
Content of source file you need to write manually with proper structure, for example:
When Source file has been configured, you need to connect it with Derived column component.
Then you need to configure Derived Column component.
2.1. Open Derived Column component setting (two clicks on icon).
2.2. Enter Name of the Column.
2.3. Select value from dropdown which column you need to replace (existing column will take new data from transformation algorithm).
2.4. Expression filed configure according to needed function ( Example: UPPER(FirstName) ).
Upper Case
2.5. To convert letters into Upper Case you need to select "UPPER(String)" function (see screenshot green pointer)
Lower Case
2.5. To convert letters into Upper Case you need to select "LOWER(String)" function (see screenshot blue pointer)
ProperCase
To convert letters into Proper case you need to configure another component named "Script Component".
2.5.1 Add script component to data flow.
2.5.2 Connect component to source with blue arrow
2.5.3 Open settings (two clicks on the icon)
2.5.4 Navigate to tab "Input Columns" and select needed cell from existing columns.
2.5.5 Navigate back to the "Script" tab and press "Edit Script..." button.
2.5.6 Scroll down to search "Input0_ProcessInputRow" function in C# code.
2.5.7 Write code with replace "FirstName" to yours column name.
2.5.8 Save changes and close Visual Studio edit window.
2.6. Press "OK"
3.Configure destination file exactly the same as the source file.
4.Finally you should get a similar result
5.Run package and you can see result in destination file.
Variables and Parameters
SSIS allows to create variables and parameters of different data types to hold values used in between tasks, environmental values, and values used in dynamically created statements. Variable and parameters differ in their scope definition and corresponding design and run-time use.
Parameters
SSIS 2012 introduced the notion of the "project-level scope" into the development of SSIS solutions. With this notion, it enabled the developers to use project scope variables called "parameters", share connection managers across packages and deploy the whole project as a whole.
Parameters differ from variables. They can be either project level or package level in scope. |
Project level variables can hold value between execution of different packages or during an execution of the same package - and this value cannot change during package execution. They are also often used in run-time to hold values pertaining to specific environment. The value could be set either at development time, or during the run time and are very useful in deployments.
These configurations allow changing parameters values based on different environments.
Also, very useful feature of the parameter is its ability to pass value from the parent to the child package.
Parameters, however, allow less data types than variables. Parameters do not support the following data types: Char, DBNull, Object.
Variables
In 2008, there are only variables. The SSIS 2008 does not have project model, and as such variables are the only way to introduce temporary storage. Starting with 2012, SSIS introduces the project model used in deployment and for shared connections, and parameters start playing a role of project or package level variable. Variables still exist and can be scoped at the package, container, task or event handler level.
The variables are used in the scenarios when their value should change within the scope of the package, container, task execution or event change. An example of their use is in loops, conditional or dynamic expressions.
Here are the examples of variable declarations:
Variables window may be opened many ways, with the most common two being right-clicking in the control-flow or data-flow area and choosing "Variables" or by choosing "View" -."Other Windows" ->"Variables" from the Visual Studio Menu pane.
Variables can be of many different data types, among the integers, string, datetime, object, DBNull and more.
Usually, in defining variable one would provide the scope followed by double colon and variable name User::variable_name
Each package be default has system variables defined and defaulted to the package attribute value:
You can see some good examples of them when you create and open "Derived Column Transformation Editor". Other components and tasks that make a good use of the variables are all the transformation editor components, as well as components using expression builder, such as "Expression Task" component, "For Loop Container" and the rest of the loops, in Data Flow tab it would be "Conditional Split", "Derived Column", and any component which attributes can be assigned dynamically and as such use Property Expression Editor. Such are connections, "FTP Task", or "Send Mail Task".
Error Handling
Every component in SSIS provides for error handling in three modes: either it allows to re-direct error message, ignore error or fail the component execution.
Log
SSIS allows to log a variety of events and use them later for analysis.
USEFUL SITES AND BLOGS
Help from Microsoft:
https://technet.microsoft.com/en-us/library/bb522537(v=sql.100).aspx
New blog with Jimmy Wong
Old blog from Matt Masson