Why Data Masking
Data masking (another terms used are data anonymization, data de-identification, data obfuscation, for the nuances of industry perception please refer here) has become mainstream in IT functions of healthcare, financial, educational, government and other types of organizations carying sensitive personal data in the last decade. Organizations use it to protect against internal threat, to hide sensitive information while exposing data to external users, and to exchange data with third parties.
A lot of organizations do it to comply with legislation, while others use it as a preventative measure even if not obligated by law. The cost on non-compliance and data breaches are very high. They are much higher than just the FTC fines - although FTC fines themselves indeed could run into millions of dollars per multiple U.S. Courts rulings. "It is not only appropriate, but critical, that the FTC has the ability to take action on behalf of consumers when companies fail to take reasonable steps to secure sensitive consumer information," says Federal Trade Commission Chairwoman Edith Ramirez. There are reputational costs and class-action litigation costs, as well as credit checks costs for financial institutions and fraud costs for health insurance organizations.
The laws are becoming more plentiful and stricter - as public demands better degree of protection.
As such, companies try to make sure they protect themselves with all the possible means. Among different measures two of the most common are encryption and data masking.
Both, data masking and encryption are used to hide data's original values.
Yet, they are not the same, both by purpose and by implementation.
The purpose of encryption is to hide data from the hacker. In data security classification, the hacker is an external threat and has no access to encryption keys. Both data in transit and data on disk are well protected with encryption against hackers outside of organization.
The purpose of data masking is to hide data from the developer. The developer often does have the key to encryption. Not only that, encrypted data, unless there is a specific provision, might not fit the predefined field sizes in the storage and makes it extremely hard to comprehend values for the developer. The difficulty in comprehension slows down development.
Encryption is a method that allows the intended communication information or message, referred to as plaintext, being encrypted using an encryption algorithm, generating ciphertext that can only be read if decrypted. Thus, the information itself does not change the content, but changes a presentation.
Data masking, or de-identification, is a method that allows the intended information to change its content in such a way that it retains the form of the information presentation yet completely loses the content. While sometimes statistical methods allow to guess original values for some types of data masked with certain methods, given certain precautions, one can reduce the probability of re-identification.