Privacy Preserving Attribute-Focused Anonymization Scheme for Healthcare Data Publishing

Advancements in Industry 4.0 brought tremendous improvements in the healthcare sector, such as better quality of treatment, enhanced communication, remote monitoring, and reduced cost. Sharing healthcare data with healthcare providers is crucial for harnessing the benefits of such improvements. In g...

Full description

Bibliographic Details
Main Authors:	J. Andrew Onesimu, Karthikeyan J, Jennifer Eunice, Marc Pomplun, Hien Dang
Format:	Article
Language:	English
Published:	IEEE 2022-01-01
Series:	IEEE Access
Subjects:	Anonymization data privacy data publishing healthcare data privacy-preserving
Online Access:	https://ieeexplore.ieee.org/document/9858117/

_version_	1828758488995594240
author	J. Andrew Onesimu Karthikeyan J Jennifer Eunice Marc Pomplun Hien Dang
author_facet	J. Andrew Onesimu Karthikeyan J Jennifer Eunice Marc Pomplun Hien Dang
author_sort	J. Andrew Onesimu
collection	DOAJ
description	Advancements in Industry 4.0 brought tremendous improvements in the healthcare sector, such as better quality of treatment, enhanced communication, remote monitoring, and reduced cost. Sharing healthcare data with healthcare providers is crucial for harnessing the benefits of such improvements. In general, healthcare data holds sensitive information about individuals. Hence, sharing such data is challenging because of various security and privacy issues. According to privacy regulations and ethical requirements, it is essential to preserve the privacy of patients before sharing data for medical research. State-of-the-art literature on privacy preserving studies either uses cryptographic approaches to protect the privacy or uses anonymizing techniques regardless of the type of attributes, this results in poor protection and data utility. In this paper, we propose an attribute-focused privacy preserving data publishing scheme. The proposed scheme is two-fold, comprising a fixed-interval approach to protect numerical attributes and an improved <inline-formula> <tex-math notation="LaTeX">$l$ </tex-math></inline-formula>-diverse slicing approach to protect the categorical and sensitive attributes. In the fixed-interval approach, the original values of the healthcare data are replaced with an equivalent computed value. The improved <inline-formula> <tex-math notation="LaTeX">$l$ </tex-math></inline-formula>-diverse slicing approach partitions the data both horizontally and vertically to avoid privacy leaks. Extensive experiments with real-world datasets are conducted to evaluate the performance of the proposed scheme. The classification models built on anonymized dataset yields approximately 13% better accuracy than benchmarked algorithms. Experimental analyses show that the average information loss which is measured by normalized certainty penalty (NCP) is reduced by 12% compared to similar approaches. The attribute focused scheme not only provides data utility but also prevents the data from membership disclosures, attribute disclosures, and identity disclosures.
first_indexed	2024-12-11T00:42:46Z
format	Article
id	doaj.art-115667d2ef974c0999370f0491820abf
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-11T00:42:46Z
publishDate	2022-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-115667d2ef974c0999370f0491820abf2022-12-22T01:26:51ZengIEEEIEEE Access2169-35362022-01-0110869798699710.1109/ACCESS.2022.31994339858117Privacy Preserving Attribute-Focused Anonymization Scheme for Healthcare Data PublishingJ. Andrew Onesimu0https://orcid.org/0000-0003-3592-6543Karthikeyan J1Jennifer Eunice2Marc Pomplun3Hien Dang4https://orcid.org/0000-0002-7112-9966Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, IndiaSchool of Information Technology and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, IndiaDepartment of Electronics and Communication Engineering, Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, IndiaDepartment of Computer Science, University of Massachusetts Boston, Boston, MA, USADepartment of Computer Science, University of Massachusetts Boston, Boston, MA, USAAdvancements in Industry 4.0 brought tremendous improvements in the healthcare sector, such as better quality of treatment, enhanced communication, remote monitoring, and reduced cost. Sharing healthcare data with healthcare providers is crucial for harnessing the benefits of such improvements. In general, healthcare data holds sensitive information about individuals. Hence, sharing such data is challenging because of various security and privacy issues. According to privacy regulations and ethical requirements, it is essential to preserve the privacy of patients before sharing data for medical research. State-of-the-art literature on privacy preserving studies either uses cryptographic approaches to protect the privacy or uses anonymizing techniques regardless of the type of attributes, this results in poor protection and data utility. In this paper, we propose an attribute-focused privacy preserving data publishing scheme. The proposed scheme is two-fold, comprising a fixed-interval approach to protect numerical attributes and an improved <inline-formula> <tex-math notation="LaTeX">$l$ </tex-math></inline-formula>-diverse slicing approach to protect the categorical and sensitive attributes. In the fixed-interval approach, the original values of the healthcare data are replaced with an equivalent computed value. The improved <inline-formula> <tex-math notation="LaTeX">$l$ </tex-math></inline-formula>-diverse slicing approach partitions the data both horizontally and vertically to avoid privacy leaks. Extensive experiments with real-world datasets are conducted to evaluate the performance of the proposed scheme. The classification models built on anonymized dataset yields approximately 13% better accuracy than benchmarked algorithms. Experimental analyses show that the average information loss which is measured by normalized certainty penalty (NCP) is reduced by 12% compared to similar approaches. The attribute focused scheme not only provides data utility but also prevents the data from membership disclosures, attribute disclosures, and identity disclosures.https://ieeexplore.ieee.org/document/9858117/Anonymizationdata privacydata publishinghealthcare dataprivacy-preserving
spellingShingle	J. Andrew Onesimu Karthikeyan J Jennifer Eunice Marc Pomplun Hien Dang Privacy Preserving Attribute-Focused Anonymization Scheme for Healthcare Data Publishing IEEE Access Anonymization data privacy data publishing healthcare data privacy-preserving
title	Privacy Preserving Attribute-Focused Anonymization Scheme for Healthcare Data Publishing
title_full	Privacy Preserving Attribute-Focused Anonymization Scheme for Healthcare Data Publishing
title_fullStr	Privacy Preserving Attribute-Focused Anonymization Scheme for Healthcare Data Publishing
title_full_unstemmed	Privacy Preserving Attribute-Focused Anonymization Scheme for Healthcare Data Publishing
title_short	Privacy Preserving Attribute-Focused Anonymization Scheme for Healthcare Data Publishing
title_sort	privacy preserving attribute focused anonymization scheme for healthcare data publishing
topic	Anonymization data privacy data publishing healthcare data privacy-preserving
url	https://ieeexplore.ieee.org/document/9858117/
work_keys_str_mv	AT jandrewonesimu privacypreservingattributefocusedanonymizationschemeforhealthcaredatapublishing AT karthikeyanj privacypreservingattributefocusedanonymizationschemeforhealthcaredatapublishing AT jennifereunice privacypreservingattributefocusedanonymizationschemeforhealthcaredatapublishing AT marcpomplun privacypreservingattributefocusedanonymizationschemeforhealthcaredatapublishing AT hiendang privacypreservingattributefocusedanonymizationschemeforhealthcaredatapublishing

Privacy Preserving Attribute-Focused Anonymization Scheme for Healthcare Data Publishing

Similar Items