Row Vs. Column Integrity
Data exists in multiple databases and potentially multiple columns within those databases. In order to mask data, those columns need to be identified.
The content of rows vary according to the context of the column. For example, the data contained in the Address column will be vastly different to the data contained in the Phone Number column. Similarly, the data in one column may directly relate to the content of another, for example the content the Title column will directly correlate to the data in the Name column. Mr. John Doe, for example.
It is for this reason that no single column or row can be masked or de-identified without first taking into consideration the content of the whole table. A sensitive data discovery or masking tool will need to examine a table of columns and rows as a whole in order to correctly change values in a row to maintain column integrity.
It is important to note that if you alter unique identifiers within a column, there is a risk of affecting the referential integrity of the affected column as well as the database as a whole. A unique identifier is also known as a primary key.
Random components allow you to create test sets when the data in production does not exist. In text-based components such as names, you can simply provide a “dummy” value in the necessary fields, such as “John” or “abc” – the component will create a random set of values.
In this case, use a SQL query to create the templated value. We suggest the following:
● Use any value for input for the strings type components.
● Cast or convert it into one of the character data types with the predefined length, i.e.
varchar(10) or nvarchar(10).
● Create an Alias Name (as shown in our example “AS FirstName”) for the column so that metadata created by the source component will have a name for this column.
Please note that not assigning the type to the column in SQL might create issues when converting to the expected data type.