Not logged in - Login


R
e
q
u
e
s
t

a

d
e
m
o
< back

Data Masking Definition

{TOC}

WHAT IS DATA MASKING?

Data Masking Definition

Data Masking is a method to hide sensitive information.

Sometimes people interchangingly use such terms as data anonymization, data de-identification, data scrambling, data scrubbing, and data obfuscation. Often times there are subtle differences in how this or that vendor defines the process. Yet, while industry seems to have different opinions on the subject, and call some subsets of data masking algorithms different names, one thing is clear.

One has to understand relative value of security and simply be able to estimate the risks. Whether you will say that invoking "expert determination" process makes "regular data masking" indeed to become a "de-identification process", it is a good practice to estimate the ability to re-identify the information after applying data masking algorithms. We will not make exaggerated claims that "rare people in the world are experts in de-identification." Along with HIPAA, we maintain that any person with the statistical knowledge can do the trick, however, we will say that at first, the person has to learn the specifics.specifics of the domain.

Data masking is not just a science of algorithms. It is also a science of public data sets. This kind of knowledge definitely calls for training, together with understanding of k-anonymity and l-diversity. Both of the terms are mathematical expression of statistical "common sense" but for those who want to learn more, please keep reading below.

k-Anonymity

Dr. Latanya Sweeney is a "mother" of the concept and gives it pretty clear definition in her most cited papers, here, here, and here. However, it is a pretty dry mathematical stuff, so if you want a strong definition, read Dr. Sweeney, otherwise, read below.

In laymen terms, k-anonymity is an ability of a data thief to identify you in the database based on the combination of the attributes that makes you unique. If you are the only one with a specific last name, and it is mentioned just once, one could pretty much assume that this last name identifies you. However, adding more people with the same last name makes you less recognizable and this is what k-anonymity stands for - k being a degree of identification among other records.

Download a Trial