Data Masking Components: Full Name Info
The purpose of the component is to mask full names with the realistic but different data.
The component first parses the data. In parsing, it assumes the following:
• It assumes that if there is only one word in the field, it is a First Name. It does not check against the dictionaries of names as such operation would slow execution down. Besides, there might be name values that are not currently in the dictionary. American parents invent new names every year. • If there are two words in the field, the component’s logic treats them as the First Name followed by the Last Name. • If there are more than two words in the field, and no commas, the first word is treated as the First Name, the last is treated as the Last name, and the second word is treated as the Middle name. If the second word is a one-letter initial, it is treated as such initial.
• If there are more than two words and there is a comma after the first word, the first word is treated as the last name, the word after the comma is treated as the first name, and the next one is treated as the middle name and/or initial. • If there are quotes around the word in the field, they are disregarded. • If there is a dash without spaces, like Ivan-Lee, the word is treated as one word.
• The assumed values, based on the positions, are run against the dictionaries as they would in the First Name( random/dynamic) and in the Last Name( random/dynamic) and are concatenated upon completion of the substitution algorithm. Thus, your four word entry may become 3 word entry, etc. • Spaces and nulls retain values. • In general, only the following formats of the full name are currently considered ( Fn – first name, Ln – last name, MI – middle initial): , Fn , Fn Ln , Fn Fn Ln , Fn MI Ln , Ln, Fn , Ln, Fn MI , Ln, Fn Fn • The rest of the possible combinations of values within the full name element value are driven to these formats • There might be discrepancies after masking, if along with the Full Name, data is also normalized as First Name and Last Name, with the Masked First Names and Masked Last Names, but if data in the Full Name has been normalized to the above formats, it will provides majority of values proper and will retain referential integrity of the combined values.
• The error output works as with the rest of the components, and should be used with GAN ( generic alpha numeric) to mask erroneous entries or the values should be stored for the later examination.
|1. Configure a source that contains the column with alphabetical and numeric characters. The data in the column may also include any other character that will be treated as a separator.|
|2. Drag and Drop Full Name Info masking component, connect the source and the generic alpha numeric component with the source's precedence constraint:|
|3. Now, the precedence constraint (the blue arrow) passes proper meta-data to the Full Name Info component. If you click on constraint, you will see:|
|4. Now that the metadata for the Full Name Info exists, and values are passed into the data masking component, please open the component editor:|
|5. In the second tab, there are input columns. Please check-mark only one column, the one that you will be masking with Full Name Info algorithm:|
|6. This will create an extra column with the prefix “Masked_”.|
|7. Create a connection manager for the destination and configure source component for the destination. In the connection manager, in the tab “Mappings”, specify that you want newly created Field_Masked to be a field replacing the original value. For that, just click on the available input columns, choose the masked value, and map to the “Available Destination Columns”|
|8. Now, all the configurations are complete for the valid values. You can run the package with the Full Name Info Data Masking component, and see the results of data masking:|
|9. If, however, there are invalid values in the package's source, one would need to configure error handling. Invalid values are those that are not conforming to the rules of the entity. To handle invalid values, each data masking component has error handling precedence constraint. One needs to create error destination connection and connect red arrow (error handling constraint) with this destination. As the connection is made, one needs to configure the state of failure: “Fail”,”Ignore” or “Redirect”.|
|10. It is recommended that one re-directs the output into the error destination, so that later one be able to analyze and process data for quality purposes. The Full Name Info components errors are the only ones not recommended for further processing as it is truly hard to break their format.|
|11. It is our suggestion that with Full Name Info component one should not process erroneous data without further analysis at all.|