When it comes to protecting private data, terms like data masking, de-identification and anonymization have become synonymous with the process of shielding private data from the wrong eyes. While all three terms can technically be used interchangeably, there are subtle differences.
What is the difference between de-identification and data masking?
The terms data masking and de-identification can be used interchangeably, but what's important is to understand what data needs to de-identified and why, as well as the right method to use for that specific need.
De-identification is the process of de-identifying sensitive data elements to prevent someone's personal identity from being revealed, whether for privacy or compliance purposes.
Data masking is the process of replacing sensitive elements with realistic replacement data, so that the data cannot be used to directly identify an individual. According to IAPP, data masking is a broad term that covers a variety of techniques including shuffling, encryption and hashing.
As with the above terms, anonymization is used to produce data that cannot be linked back to an individual.
While data de-identification and anonymization are methods that are historically used to target indirect identifiers, data masking has become synonymous with the same function due to the variety of algorithms used to de-identity both direct and indirect identifiers, such as k-anonymity.
For some interesting background reading on this topic, check out our Wiki.
Standards and Guidelines
A helpful resource for understanding when and how data should be de-identified is by studying the guidelines of data privacy laws such as The Health Insurance Portability and Accountability Act (HIPAA) and the General Data Protection Regulation (GDPR).
HIPAA, for example, defines the standard for de-identification, in section 164.514(a) of the HIPAA Privacy Rule as follows:
"Health information is not individually identifiable if it does not identify an individual and if the covered entity has no reasonable basis to believe it can be used to identify an individual."
In order to achieve this standard, it offers in Sections 164.514(b) and(c) of the Privacy Rule two primary methods to de-identify data, namely Expert Determination and Safe Harbor.
Expert Determination takes a risk-based approach to de-identification to determine the likelihood of a person being identified from their protected health information (PHI).
The "Safe Harbor" method involves the removal of specific information about a patient that can be used alone or in combination with other information to identify that patient. It goes on to identify the 18 elements recommended for de-identification, which you can find here.
The GDPR sets out its own comprehensive set of requirements for the collection, retention and processing of sensitive data. Most importantly, it requires de-identified data to be irreversible and for the subject to be no longer identifiable.
De-identification, data masking and anonymization all involve the process of removing or obscuring personally identifiable elements from individual data records to minimize the risk of unintended disclosure of identity. The benefit of data masking, however, is that it is irreversible, making it the preferred method of data protection to achieve compliance with the GDPR.
Whether you want to de-identity data for privacy, compliance or both, Hush-Hush Data Masking can help you control the flow of data in your business in order to reach your compliance goals.
Request a free trial or demo now.