Anonymization is the process of removal of personal information from a document or from a database in order to prevent identification of individuals. The anonymization of data in a document or database means that the data is not attributable to specific individuals. Removal should be done so that no one can manage to access the personal data again in the given document or database, neither should anyone be able to match the data with a particular person (unlike so-called pseudonymization).
The result of the process of anonymization should be a database, document or other media file containing information that cannot be attributed to specific individuals.
What do we use anonymization for?
Anonymization is used in mass data processing for statistical or evaluation purposes, for example in marketing. We also use it to remove personal data from contracts, invoices, or other documents. Businesses also use anonymization to prepare sample data for stress testing of applications or databases, allowing them to be tested on near-real data, while avoiding the leakage of confidential information.
What is the anonymization process?
Depending on the purpose of data anonymization (e.g. testing a new application), the personal data is either erased or replaced by a text - randomly generated data. In some cases, the original data format (such as address format, format of an account number, etc.) must be maintained to avoid a crash of the database. An anonymization tool must then respect this format in the anonymization process. To do so, it can, for example, replace the data using entries from a pre-defined dictionary. Anonymization of data can also happen by aggregation of individual records resulting in statistical summaries.
Documents such as contracts or photos can be anonymized by erasing the personal identification data, for example, by blacking out or blurring the concerned portion of the document, picture or photo.
What are the most commonly used anonymization methods?
- generating random text, random characters
- generating digital noise
- generating text from pre-defined dictionaries (such as names and surnames, addresses, etc.)
- blacking out text or part of a picture or photo
- blurring text or part of the picture or photo
- micro aggregation
- random data shuffling