Data Masking Components: Generic Shuffle
The purpose of the component is to mask such values as Account or Patient Numbers that often consist of the combination of Alpha and Numeric characters and are not formed by a specific rule.
|Mask Generic Shuffle component “shuffles” selected column values among the other columns by replacing the values of the column with the values of this very column. One of the columns in the table should be a primary key. It maps a data set of the element on itself. The component starts saving data onto the disk if the memory capacities are exceeded. The batch size is regulated in the custom property.
|1. Configure a source that contains the column with alphabetical and numeric characters. The data in the column may also include any other character that will be treated as a separator.
|2. Drag and Drop Mask Generic Shuffle masking component, connect the source and the generic alpha numeric component with the source's precedence constraint:
|3. Now, the precedence constraint (the blue arrow) passes proper meta-data to the Mask Generic Shuffle component. If you click on constraint, you will see:
|4. Now that the metadata for the GAN exists, and values are passed into the data masking component, please open the component editor:
|5. In the second tab, there are input columns. Please check-mark only one column, the one that you will be masking with Mask Generic Shuffle algorithm:
|6. This will create an extra column with the prefix “Masked_”.
|7. Create a connection manager for the destination and configure source component for the destination. In the connection manager, in the tab “Mappings”, specify that you want newly created Field_Masked to be a field replacing the original value. For that, just click on the available input columns, choose the masked value, and map to the “Available Destination Columns”
|8. Now, all the configurations are complete for the valid values. You can run the package with the Generic Shuffle Data Masking component, and see the results of data masking:
|9. If, however, there are invalid values in the package's source, one would need to configure error handling. Invalid values are those that are not conforming to the rules of the entity. To handle invalid values, each data masking component has error handling precedence constraint. One needs to create error destination connection and connect red arrow (error handling constraint) with this destination. As the connection is made, one needs to configure the state of failure: “Fail”,”Ignore” or “Redirect”.
|10. It is recommended that one re-directs the output into the error destination, so that later one be able to analyze and process data for quality purposes. The Generic Alpha Numeric components errors are the only ones not recommended for further processing as it is truly hard to break their format.
|11. It is our suggestion that with Generic Alpha Numeric component one should not process erroneous data without further analysis at all.