e
q
u
e
s
t
a
d
e
m
o < back
Sensitive Data Discovery
data definition and discovery process
When a person is tasked with data anonymization the first thing s/he does is try to understand what subset of data s/he needs to address with masking. Basically, the first question is "what is sensitive data"?
the definition
The term "Sensitive Data" or "PII" (personally identifiable information" or "PHI" in heath, protected health information, stands for the data that describes a person in a specific way, with certain attributes. The knowledge of the values of these attributes allows other people to re-identify that specific person among other people.
For example, the knowledge of Social Security Number allows learning a lot of things about a person. Social Security Number invariably is used in multiple systems during this person's life and is unique. The SSN value in the wrong hands can lead to false credit card applications, fraud medical claims, and exposure of public information about students.
fraud
There is a black market for stolen PII. Each element has its own price -for the very reason that it helps to earn the money in illegal ways. Besides commercial vendors, FBI and other government law enforcement entities take issue very seriously. People committing fraud get harsh sentences
privacy
Even if not with fraudulent intentions, compromising one's privacy is not desired. It is quite possible that a person would not want their employer, neighbours and sometimes even family members to find out about their health issues (https://www.fbi.gov/news/stories/2012/november/estate-planner-victimized-terminally-ill). Recent stolen data about extramarital affairs from Ashley Madison's site exposed a lot of people, and no matter how questionable the ethics of these people or behavior was, it cost a lot of ruined careers and even suicides (https://en.wikipedia.org/wiki/AshleyMadisondatabreach#Impactand_ethics)
Attributes domain
So, whatwhich definesattributes define a domain of sensitive data in terms of person's attributes?privacy and de-identification?
The Health industry defined the minimum number of attributes that define a domain in their "Safe Harbor" list of attributes. The process of defining such domain is called "sensitive data discovery" and currently is defined in terms of discovering data in the systems across enterprise.