Lifecycle and Data Masking
Lifecycle and Data Masking
Data masking is not a separate activity that happens somewhere in the centralized place in organization. It is a step in the application lifecycle architecture properly positioned in the overall process - whether the tool makes depositories centralized or distributed.
Overall data lifecycle architecture differs for startups and developed companies in the beginning - before at least one product in the organization becomes customer facing.
In startups, in the beginning, there is a process of building infrastructure and processes and there is rarely plenty of consumer data. Thus, majority of the test data is created via CRUD statements or small uploads and all of it could be fictitious.
Only when the startup product hits "production" ( becomes customer facing), the application starts receiving real data, and often times parts of it are PII ( personally identifiable information).
The moment the product is deployed to the production environment is the starting point when there might appear a need to move production data into the development environments. It is done in order to create as many test cases close to production as possible with little development effort.
This is also a starting point for data masking processes.
In the developed organizations, the development does not start with from the blank page. There is always data in the organizations and when the new project starts, this data is transferred to the development environments as it would be done in the production. The requirements of the initial data population will depend on the application; however, most likely, the data will start flowing and integration processes will be established from the beginning - be it CRM, patient records via HL7 messages or other types of files, or account information in the form of the batch file uploads.
This is the point when the lifecycle architecture gets established.
Usual lifecycle consists of
creating the repository in the source control,
developing code in the sandboxes with unit testing
propagating successfully built code to the source control for further integration
integrating ( hopefully with CI server) on the integration server
upon successful conflicts resolution, propagating to the further environments - testing, UA, deployment, performance testing, etc.
deploying to production
fixing data quality issues
refreshing environments A. propagating new production code
B. sub-setting or up-setting, according to requirements and required purpose of the environment
C. masking sensitive data and moving production data backwards. Data will include:
c.1 master data
c.2 transactional data,
The new development cycle starts at this point, after refreshing the non-production environments.
There might be different number of environments in the organization that support development process. We are not considering organizations that have only one environment, such extreme tactics are not the focus of this article. Majority of the organizations divide development and production activities and create different environments for different purposes. Minimum there will be a sandbox and a production server.
However, an organization may also supply an integration server where all the code of the different group members is automatically integrated, QA (testing) environment where QA conduct their activities, sometimes QA will have several environments themselves, where they would test functional and nonfunctional requirements. Often there is a deployment verification, user acceptance testing and demo environments. All of the get populated with data and the methods of population differ.