Industry Mythology

{TOC}

Myths and Reality: Background

By the very end of twenties century, not only separate organizations but complete industries in multiple countries recognized the necessity to protect sensitive personal information is software applications. Health, financial, retail, educational, entertainment industries along with government and state authorities started enacting various privacy legislations.

Different scenarios of data use required different protection methods. While certain protection methods such as encryption or access control were very well understood, others started being surrounded by a variety of myths. One of the least understood methods turned out to be data masking. Just the sheer number of synonyms used to indicate the same activity such as: data de-identification , data anonymization, data pseudonymization, data obfuscation, data obscuring, etc. with the various attempts to explain the difference brings a lot of confusion to the practitioners. On top of it, masking is confused with encryption, considered unreliable, misunderstood in terms of risks to the enterprises, and disliked by DBAs and DB developers due to often poor implementations.

myth #1 : masking is a special form of encryption

Data masking is not encryption. Yes, the data is replaced by different data, yes, there might be keys upon which the replacements are made, and, yes, it is a security protection mechanism. This is where the similarities end. The very use cases and basic principles are different. Static Data Masking is used to protect data-at-rest against the developer and sometimes DBA. Dynamic Data masking is used to protect data on-the-fly against possible user-related fraud. Data masking uses one way data replacement mechanism. Data masking maintains current data format and rules.

myth #2 : enterprize wide consistent masking requires a centralized system

This one is a very persistent ~~myth~~myth. ~~across the business stakeholders. Logically, it~~It seems to be un-penetrable: there is a "central command center" that allows data masking architect establish all the policies for all the systems of record and warehouse ~~and~~in asthe organization. As a ~~result~~result, data is consistent everywhere. ~~However,~~There are two flows with the assumption of the fact that centralized data architect is the best choice for the enterprise masking solutions. While from the perspective of k-anonymity it makes no difference who makes decisions, from the perspective of t-closeness and l-diversity, the closer the decision maker to the system - the better. The architect who is far away from system design is not the best choice.

The second flow is an assumption that conceptual and physical designs ~~do not have to be~~are the same. The "command center" - especially if it is in the cloud on a different address - becomes a very mixed blessing for two aspects of data masking: time-to-market and performance. -The phenomenon is very similar to that we witness in the battle of Inmon vs. Kimball. Implementations are agile if they happen close to business decision making and it is contrary to the centralized command. But there is a way to maintain the same policies while delivering them close to the decision making. Such way is a component-based architecture. Each component is a standardized algorithm indeed delivered strait at the place that knows most about the data. Such design also removes all the issues with possible performance that relates to the networking traffic as data masking happens "just in time".

Download a Trial