The laws of the European Union do specify that data should be anonymized and/or pseudonymised, in Convention 108, Article 5 (e) Convention 108, Explanatory report, Article 42. In particular, data must be kept “in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the data were collected or for which they are further processed.” For that, data has to be anonymized straight after use and archived. The definition of “necessary” also leads to pseudoanonymization although not directly mentioned in such aspect. However, in continuous development of the applications, there is no direct need for the developers, for example, to see sensitive data. Also, sometimes the further need arises to use archived data and for keeping meaning of the complete context, not just deleting data out of the context.
Considered the most comprehensive data privacy law in effect, the General Data Protection Regulation (GDPR) extends to all businesses (including businesses that operate outside of Europe) that offer goods and services to European residents and collect personal data in the process.
The GDPR specifically requires the use of data protection methods to safeguard private data.
The following provisions specifically relate to the protection of data:
- Article 3, which refers to the processing of data
- Article 4, which defines the parameters of de-identification
- Article 5, which refers to the retention of data
- Article 11, which addresses processing that does not require identification
- Article 17, which refers to the deletion of data
- Article 24, which refers to the responsibility of the controller
- Article 25, which refers to reasonable measures to protect consumer data, by default and by design
- Article 32, which deals with the security of processing
- Article 34, which refers to protection measures to mitigate data breaches
- Article 40, which refers to the codes of conduct of pseudonymization
HIPAA provides the following guidelines for data de-identification methods and solutions: Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule
HIPAA defines a standard for de-identification, in section 164.514(a) of the HIPAA Privacy Rule as "health information is not individually identifiable if it does not identify an individual and if the covered entity has no reasonable basis to believe it can be used to identify an individual." In order to achieve this standard, it offers in Sections 164.514(b) and(c) of the Privacy Rule two primary methods to de-identify data. One involves employing the data professional who is able to do an expert determination of the sensitive data, methods of de-identification, and verify risks of resulting solution by employing scientific and statistical principles. This method is certainly very solid as the individual is capable of understanding the context of the data set. The second method is called "Safe Harbor" and operates on the principle that if we know of specific data elements's statistics that exists in the public domain related to sensitive data, and we indeed understand the statistical risk of identifying a person using these elements, we could hide these elements and address the greatest risks with specific rules. Such elements were identified at a time and are listed below and they are the basis for "Safe Harbor" rules:
- All geographic subdivisions smaller than a State, including street address, city, county, precinct, zip code, and their equivalent geocodes, except for the initial three digits of a zip code if, according to the current publicly available data from the Bureau of the Census: The geographic unit formed by combining all zip codes with the same three initial digits contains more than 20,000 people; and The initial three digits of a zip code for all such geographic units containing 20,000 or fewer people is changed to 000. Currently, 036, 059, 063, 102, 203, 556, 592, 790, 821, 823, 830, 831, 878, 879, 884, 890, and 893 are all recorded as "000".
- All elements of dates (except year) for dates directly related to an individual, including birth date, admission date, discharge date, date of death; and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older;
- Telephone numbers;
- Fax numbers;
- Electronic mail addresses;
- Social security numbers;
- Medical record numbers;
- Health plan beneficiary numbers;
- Account numbers;
- Certificate/license numbers;
- Vehicle identifiers and serial numbers, including license plate numbers;
- Device identifiers and serial numbers;
- Web Universal Resource Locators (URLs);
- Internet Protocol (IP) address numbers;
- Biometric identifiers, including finger and voice prints;
- Full face photographic images and any comparable images; and
- Any other unique identifying number, characteristic, or code, except as permitted by the re-identification rules.
Removing these elements could break the normal functioning of the program, thus after removal ( or instead) practitioners often use a practice of masking these 18 elements with other values of the same format and semantical meaning.
Introduced in 1999 while removing barriers in the market among banks, insurance agencies, and investment institutions, GLBA also established a set of rules and regulations that protect consumer privacy and secure consumer's data.
Section 501(b) of GLBA requires organizations to establish financial institution standards for protecting the security and confidentiality of said financial institution's customers' non-public personal information. These standards relate to administrative, technical, and physical safeguards.
-to insure the security and confidentiality of customer records and information; -to protect against any anticipated threats or hazards to the security or integrity of such records; and -to protect against unauthorized access to or use of such records or information which could result in substantial harm or inconvenience to any customer
The Federal Trade Commission helps define which organizations should satisfy the regulations.
These are some examples:
- Loan lenders
- Foreign exchange companies
- Money transfer companies
- Hedge fund management companies
- Equity investment companies
- Insurance companies
- Mortgage Brokers
- Asset Management firms
- Financial advisers
- Financial brokers
- Credit companies
Using data masking in institutional privacy per design standards helps organizations to adhere to the section 501(b).
It helps conceal sensitive data both in development environments and in production. In production, they often substitute sensitive values for use by personnel with limited access to data. An example of such a situation is an off-shored billing and other BPO operation with sensitive data.
It is customary for financial institutions to mask names, date of birth, social security, tax id number, accounts, credit card numbers
Merchants and credit card processing companies need to comply with Payment Card Industry DATA SECURITY STANDARD that facilitates consistent measures for data security globally. These organizations include any entity processing credit card transactions including but not limited to:
- brick and mortar stores
- internet commerce
- ATM and cash register operators
- money transfer companies
- money exchange companies
- check processing companies
PCI DSS establishes the technical and operational framework needed to protect consumers from data security risks. There are several persistent data elements that PCI DSS either dictates standards of protection for, or recommends best practices. These data elements include
Primary Account Number (PAN),
PCI DSS makes it mandatory to mask the PAN both in production and in development environments and recommends to protect the rest of the persistent elements in accordance with the local legislature and best practices. Many institutions decide to be proactive and safeguard names, dates and service codes with masking as well.
The Family Educational Rights and Privacy Act (FERPA) (20 U.S.C. § 1232g; 34 CFR Part 99) is a Federal law that protects the privacy of student education records. The law applies to schools receiving funding from the U.S. Department of Education under appropriate programs. Under the law, the disclosure of private information requires consent of the parents. Under this law, all the software development activities would require real data about students be hidden.
In addition to federal laws governing data anonymization requirements in different industries, each state enforces their own legislature to data privacy protection. You can find relevant information about state laws here: States legislature
Canada has two main laws governing data privacy: the Privacy Act and The Personal Information Protection and Electronic Documents Act (PIPEDA)
These laws apply to disclosure of the private information in commercial and government activities.
Provinces have similar to PIPEDA enacted laws:
British Columbia: Personal Information Protection Act
Alberta: Personal Information Protection Act
New Brunswick: Personal Health Information Privacy and Access Act
Newfoundland and Labrador: Personal Health Information Act