Sensitive data is data that allows other people to identify you as a person within other records. If someone steals or accesses your sensitive data without permission, they could do irreparable harm through credit card fraud, medical records fraud and other forms of identity fraud. There are many laws protecting your identity and data; however, for the purposes of this document we need to concern ourselves with those acts that protect data in development and integration, such as GLBA, HIPAA, PCI/DSS, GDPR, and local countries’ state and municipalities initiatives.
Other names for sensitive data include PII (personally identifiable information), PHI (personal health information), private data, direct and indirect identifiers, etc
The domain of sensitive data and de-identification was introduced by Latanya Sweeny, and she was also the first to define models for sensitive data. In particular, such attributes as names, addresses, cities, zip codes, dates, VINs, driver licenses, passport numbers, SSN/SIN and other forms of IDs, telephone numbers, emails all constitute sensitive data. The model includes unique identifiers (direct identifiers in another classification) such as SSN: you and only you have your own personal unique SSN. Direct identifiers include information that relates specifically to an individual such as the individual’s residence, including for example, name, address, Social Security Number or other identifying number or code, telephone number, e-mail address, or biometric record. It also contains indirect identifiers. Non-unique identifiers (Indirect identifiers) include information that can be combined with other information to identify specific individuals, including, for example, a combination of gender, birth date, geographic indicator and other descriptors. Other examples of indirect identifiers include place of birth, race, religion, weight, activities, employment information, medical information, education information, and financial information.
In some industries, the basic sensitive data model has been described in case there is no expertise “in-house”. For example, HIPAA lists 18 elements you would need to mask, and calls this model “Safe Harbor”. This is by far not the best, but rather a sufficient model for de-identification.
HIPAA considers expert determination of the model to be the best method for data de- identification. In order to achieve the best results with expert determination, you have to understand types of attacks and industry-related metrics, such as k-anonymity and l-diversity (see the HushHush white paper on this topic).
Other industries do not necessarily have “Safe Harbor”. Their attributes will also include other sets of metadata. However, the methods to determine which data is sensitive and needs to be de- identified are the same.
Sensitive Data Definition
Copyright © 2025 Hush-Hush. All rights reserved
Copyright © 2025 Hush-Hush.
All rights reserved