Not logged in - Login


R
e
q
u
e
s
t

a

d
e
m
o
< back

Algorithms

{TOC}

Substitution

Substitution masks data by replacing a given value with another value suitable for the given entity, be it a field or part of the text or any other type of an entity. Substitution could be random or pseudo-random, could preserve referential integrity and statistical distribution or disturb it, and could deal with the whole, at-once, value replacements or complex replacement patterns of the parts of the entity.

In mathematical terms, the substitution method of data masking allows mapping of members of one set to the members of another set, such that the members of replacing set are of the same intensional definition.

Random

Random substitution masks data by replacing a given value with a random value from a pre-compiled data set. Values in the data set are suitable and conform to the same rules or definitions that of the given value. With each iteration of the masking algorithm, another random value gets chosen and replaces the value on the input. An example would be a value of John being masked with values of Alex, Robert, and Mike in subsequent iterations of the substitution operation.

As data masking of the given value with Random Substitution does not guarantee to be repeatable among cycles, it is not suitable for masking data sets with unique constraints and requires extra work of preserving referential integrity if used on the values of fields preserving referential integrity constraint. In denormalized data sets it might cause difficulties in implementing row-internal synchronization, table internal synchronization and table-to-table synchronization operations.

Preserving Referential Integrity

These are algorithms that mask data by replacing a given value with a pseudo-random value from a pre-compiled data set. The "pseudo" comes from the fact that there is an underlying algorithm that matches a given value to the very same value at each iteration of the substitution operation.

An example would be a value of a name Jane always masked with the value of the name Virginia. Such substitution allows for better preservation of referential integrity as well as for synchronization of values among cells in the same and different tables and in the text.

Replacing sets in the substitution data masking algorithm could have the same numbers of values as original sets ( the same cardinality) or different number of values.

Disturbing Statistics

For certain types of de-identified data, its statistics becomes security's "enemy". Such examples were identified by Dr. Latanya Sweeney , then graduate student, today's Harvard's professor with specialization in computer science and privacy. She identified such data and [HIPAA's famous 18 elements](enter url or page name) list some of them: zip codes, dates of births, and there are many more. When a subtype of substitution data masking technique matches one data set to another data set with the same number of members, one to one, the statistical distribution of the values is completely preserved. If there exists a public data set and/or its statics is in the public domain, someone knowing the statistics could re-identify the de-identified data. Thus,

Unique

TEXT

Character Permutation and Character Substitution

Two algorithms that manipulate character of a given string.

The character permutation data masking algorithm uses characters of a given string as an input set and maps this set on itself by creating various permutations of the characters of the string either randomly or in pre-defined repeatable pattern.

The character substitution data masking algorithm besides a given string uses another set of the characters with the specific mapping rules, creating an output based on either random or predefined mappings.

The strongest masking algorithm of this variety is random character substitution, followed by random character permutation and pre-defined character substitution, followed by pre-defined character permutation.

Format Preserving Encryption

TEXT

Shuffle

TEXT

Date Variance

TEXT

Number Variance

TEXT

Nulling

TEXT

Download a Trial