Data Anonymization: An Experimental Evaluation Using Open-Source Tools

In recent years, the use of personal data in marketing, scientific and medical investigation, and forecasting future trends has really increased. This information is used by the government, companies, and individuals, and should not contain any sensitive information that allows the identification of...

Full description

Bibliographic Details
Main Authors: Joana Tomás, Deolinda Rasteiro, Jorge Bernardino
Format: Article
Language:English
Published: MDPI AG 2022-05-01
Series:Future Internet
Subjects:
Online Access:https://www.mdpi.com/1999-5903/14/6/167
_version_ 1797487334577405952
author Joana Tomás
Deolinda Rasteiro
Jorge Bernardino
author_facet Joana Tomás
Deolinda Rasteiro
Jorge Bernardino
author_sort Joana Tomás
collection DOAJ
description In recent years, the use of personal data in marketing, scientific and medical investigation, and forecasting future trends has really increased. This information is used by the government, companies, and individuals, and should not contain any sensitive information that allows the identification of an individual. Therefore, data anonymization is essential nowadays. Data anonymization changes the original data to make it difficult to identify an individual. ARX Data Anonymization and Amnesia are two popular open-source tools that simplify this process. In this paper, we evaluate these tools in two ways: with the OSSpal methodology, and using a public dataset with the most recent tweets about the Pfizer and BioNTech vaccine. The assessment with the OSSpal methodology determines that ARX Data Anonymization has better results than Amnesia. In the experimental evaluation using the public dataset, it is possible to verify that Amnesia has some errors and limitations, but the anonymization process is simpler. Using ARX Data Anonymization, it is possible to upload big datasets and the tool does not show any error in the anonymization process. We concluded that ARX Data Anonymization is the one recommended to use in data anonymization.
first_indexed 2024-03-09T23:46:09Z
format Article
id doaj.art-cd5e8e1a68bd473e975f1dd1b66c24a6
institution Directory Open Access Journal
issn 1999-5903
language English
last_indexed 2024-03-09T23:46:09Z
publishDate 2022-05-01
publisher MDPI AG
record_format Article
series Future Internet
spelling doaj.art-cd5e8e1a68bd473e975f1dd1b66c24a62023-11-23T16:43:28ZengMDPI AGFuture Internet1999-59032022-05-0114616710.3390/fi14060167Data Anonymization: An Experimental Evaluation Using Open-Source ToolsJoana Tomás0Deolinda Rasteiro1Jorge Bernardino2Institute of Engineering of Coimbra—ISEC, Polytechnic of Coimbra, Rua Pedro Nunes, 3030-199 Coimbra, PortugalInstitute of Engineering of Coimbra—ISEC, Polytechnic of Coimbra, Rua Pedro Nunes, 3030-199 Coimbra, PortugalInstitute of Engineering of Coimbra—ISEC, Polytechnic of Coimbra, Rua Pedro Nunes, 3030-199 Coimbra, PortugalIn recent years, the use of personal data in marketing, scientific and medical investigation, and forecasting future trends has really increased. This information is used by the government, companies, and individuals, and should not contain any sensitive information that allows the identification of an individual. Therefore, data anonymization is essential nowadays. Data anonymization changes the original data to make it difficult to identify an individual. ARX Data Anonymization and Amnesia are two popular open-source tools that simplify this process. In this paper, we evaluate these tools in two ways: with the OSSpal methodology, and using a public dataset with the most recent tweets about the Pfizer and BioNTech vaccine. The assessment with the OSSpal methodology determines that ARX Data Anonymization has better results than Amnesia. In the experimental evaluation using the public dataset, it is possible to verify that Amnesia has some errors and limitations, but the anonymization process is simpler. Using ARX Data Anonymization, it is possible to upload big datasets and the tool does not show any error in the anonymization process. We concluded that ARX Data Anonymization is the one recommended to use in data anonymization.https://www.mdpi.com/1999-5903/14/6/167data anonymizationOSSpal methodologyARX Data Anonymization toolAmnesia
spellingShingle Joana Tomás
Deolinda Rasteiro
Jorge Bernardino
Data Anonymization: An Experimental Evaluation Using Open-Source Tools
Future Internet
data anonymization
OSSpal methodology
ARX Data Anonymization tool
Amnesia
title Data Anonymization: An Experimental Evaluation Using Open-Source Tools
title_full Data Anonymization: An Experimental Evaluation Using Open-Source Tools
title_fullStr Data Anonymization: An Experimental Evaluation Using Open-Source Tools
title_full_unstemmed Data Anonymization: An Experimental Evaluation Using Open-Source Tools
title_short Data Anonymization: An Experimental Evaluation Using Open-Source Tools
title_sort data anonymization an experimental evaluation using open source tools
topic data anonymization
OSSpal methodology
ARX Data Anonymization tool
Amnesia
url https://www.mdpi.com/1999-5903/14/6/167
work_keys_str_mv AT joanatomas dataanonymizationanexperimentalevaluationusingopensourcetools
AT deolindarasteiro dataanonymizationanexperimentalevaluationusingopensourcetools
AT jorgebernardino dataanonymizationanexperimentalevaluationusingopensourcetools