In the digital age we operate it in today, personal data is widely recognized as an asset, as well as a commodity. It can be bought and sold, and is the basis of most corporate business intelligence and marketing strategies. Due to its associated value, personal data is a primary target for cybercriminals, hackers, and malware.
Thankfully for consumers, personal data is protected under privacy laws such as the GDPR and HIPAA. Companies that collect, store, or handle personal data are legally obligated to ensure measures are in place to protect it.
It is important to define and differentiate data privacy terminology in order to understand why protective measures are necessary to safeguard it. Legal jargon is known to become a little confusing, particularly when the definitions are so similar and used interchangeably.
In this blog, we will endeavor to set out the key terms and explain the differences between them.
Standard data privacy terminology
In the articles set out in regulations such as the GDPR, HIPAA, and CCPA, the following standard terms and abbreviations are used to describe an individual’s data:
- Personally Identifiable Information (PII)
- Protected Health Information (PHI)
- Electronic Protected Health Information (ePHI)
PII
Personally Identifiable Information is any information that relates to an identified or identifiable living individual. This includes elements such as name, gender, date of birth, place of birth, current address, and social security number.
According to the GDPR, examples of PII include:
- name and surname;
- home address;
- email address
- identification card number;
- location data
- IP address
- medical information
PHI and ePHI
Protected Health Information relates to an individual’s medical history and healthcare information.
Examples of PHI and ePHI include:
- Health history
- Details of healthcare treatment
- Treatment payment details
- Medical records stored on electronic media; maintained in electronic media; transmitted or maintained in any other electronic form or medium.
What is sensitive data?
Sensitive data is data that describes a person in a specific way, with certain attributes that used together, can be used to identify that person. Non-unique identifiers such as name, data of birth place of birth, and gender are generic in nature and cannot be used to accurately identify someone. However, this information used in conjunction, or used in conjunction with a unique identifier such as a social security number, can be used to accurately identify someone and is therefore considered sensitive.
Sensitive data is the most at risk of data leaks and breaches and can result in credit card fraud, insurance fraud, identity theft, and even blackmail. For businesses, this can lead to devastating financial and reputational losses.
HIPAA and GDPR require that special security measures be taken to safeguard sensitive data that protect against data breaches and also ensure an individual’s right to privacy. These protective measures include data discovery and data masking, encryption, and removal.
How does technology identify and protect sensitive data?
Algorithms, such as those used for Sensitive Data Discovery, locate unique identifiers, non-unique identifiers, and a combination of both elements, to determine and classify instances of sensitive data that can be used to identify an individual in your databases, files, email, and so on. This data can then be de-identified using Data Masking, which hides or anonymizes sensitive information by replacing real values with substitute “realistic” values.
Data masking algorithms employ data privacy metrics such as k-anonymity and l-diversity, which, depending on your needs, preserve privacy in data sets by reducing the granularity of a data representation or by suppressing or generalizing individual attributes respectively.
According to the terms of the GDPR, personal data that has been rendered anonymous in such a way that the individual cannot be identified is no longer considered personal data. For data to be truly anonymized, the anonymization must be irreversible, such as the case with Data Masking.
The Health Industry has its own defined set of sensitive attributes known collectively as Safe Harbor. Sensitive Data Discovery and Data Masking can seek out these attributes for the purpose of masking or removal.
Data breaches come at a heavy cost. It’s important to understand the sensitivity level of data in your business, the threats that surround that data, and the regulations that govern its use. Hush-Hush Sensitive Data Discovery and Data Masking tools help you accurately locate and classify sensitive data in your business and mask the sensitive attributes that prevent identification.