A privacy-preserving distributed filtering framework for NLP artifacts

Abstract Background Medical data sharing is a big challenge in biomedicine, which often hinders collaborative research. Due to privacy concerns, clinical notes cannot be directly shared. A lot of efforts have been dedicated to de-identifying clinical notes but it is still very challenging to accurat...

Full description

Bibliographic Details
Main Authors:	Md Nazmus Sadat, Md Momin Al Aziz, Noman Mohammed, Serguei Pakhomov, Hongfang Liu, Xiaoqian Jiang
Format:	Article
Language:	English
Published:	BMC 2019-09-01
Series:	BMC Medical Informatics and Decision Making
Subjects:	Biomedical data security and privacy Clinical notes de-identification Homomorphic encryption
Online Access:	http://link.springer.com/article/10.1186/s12911-019-0867-z

_version_	1818576315619475456
author	Md Nazmus Sadat Md Momin Al Aziz Noman Mohammed Serguei Pakhomov Hongfang Liu Xiaoqian Jiang
author_facet	Md Nazmus Sadat Md Momin Al Aziz Noman Mohammed Serguei Pakhomov Hongfang Liu Xiaoqian Jiang
author_sort	Md Nazmus Sadat
collection	DOAJ
description	Abstract Background Medical data sharing is a big challenge in biomedicine, which often hinders collaborative research. Due to privacy concerns, clinical notes cannot be directly shared. A lot of efforts have been dedicated to de-identifying clinical notes but it is still very challenging to accurately locate and scrub all sensitive elements from notes in an automatic manner. An alternative approach is to remove sentences that might contain sensitive terms related to personal information. Methods A previous study introduced a frequency-based filtering approach that removes sentences containing low frequency bigrams to improve the privacy protection without significantly decreasing the utility. Our work extends this method to consider clinical notes from distributed sources with security and privacy considerations. We developed a novel secure protocol based on private set intersection and secure thresholding to identify uncommon and low-frequency terms, which can be used to guide sentence filtering. Results As the computational cost of our proposed framework mostly depends on the cardinality of the intersection of the sets and the number of data owners, we evaluated the framework in terms of these two factors. Experimental results demonstrate that our proposed method is scalable in various experimental settings. In addition, we evaluated our framework in terms of data utility. This evaluation shows that the proposed method is able to retain enough information for data analysis. Conclusion This work demonstrates the feasibility of using homomorphic encryption to develop a secure and efficient multi-party protocol.
first_indexed	2024-12-16T06:12:04Z
format	Article
id	doaj.art-24a1d32044984e03ac4811712f3eef06
institution	Directory Open Access Journal
issn	1472-6947
language	English
last_indexed	2024-12-16T06:12:04Z
publishDate	2019-09-01
publisher	BMC
record_format	Article
series	BMC Medical Informatics and Decision Making
spelling	doaj.art-24a1d32044984e03ac4811712f3eef062022-12-21T22:41:22ZengBMCBMC Medical Informatics and Decision Making1472-69472019-09-0119111010.1186/s12911-019-0867-zA privacy-preserving distributed filtering framework for NLP artifactsMd Nazmus Sadat0Md Momin Al Aziz1Noman Mohammed2Serguei Pakhomov3Hongfang Liu4Xiaoqian Jiang5Department of Computer Science, University of ManitobaDepartment of Computer Science, University of ManitobaDepartment of Computer Science, University of ManitobaDepartment of Pharmaceutical Care & Health Systems, University of MinnesotaDepartment of Health Sciences Research, Mayo Clinic College of MedicineSchool of Biomedical Informatics, University of Texas Health Science Center at HoustonAbstract Background Medical data sharing is a big challenge in biomedicine, which often hinders collaborative research. Due to privacy concerns, clinical notes cannot be directly shared. A lot of efforts have been dedicated to de-identifying clinical notes but it is still very challenging to accurately locate and scrub all sensitive elements from notes in an automatic manner. An alternative approach is to remove sentences that might contain sensitive terms related to personal information. Methods A previous study introduced a frequency-based filtering approach that removes sentences containing low frequency bigrams to improve the privacy protection without significantly decreasing the utility. Our work extends this method to consider clinical notes from distributed sources with security and privacy considerations. We developed a novel secure protocol based on private set intersection and secure thresholding to identify uncommon and low-frequency terms, which can be used to guide sentence filtering. Results As the computational cost of our proposed framework mostly depends on the cardinality of the intersection of the sets and the number of data owners, we evaluated the framework in terms of these two factors. Experimental results demonstrate that our proposed method is scalable in various experimental settings. In addition, we evaluated our framework in terms of data utility. This evaluation shows that the proposed method is able to retain enough information for data analysis. Conclusion This work demonstrates the feasibility of using homomorphic encryption to develop a secure and efficient multi-party protocol.http://link.springer.com/article/10.1186/s12911-019-0867-zBiomedical data security and privacyClinical notes de-identificationHomomorphic encryption
spellingShingle	Md Nazmus Sadat Md Momin Al Aziz Noman Mohammed Serguei Pakhomov Hongfang Liu Xiaoqian Jiang A privacy-preserving distributed filtering framework for NLP artifacts BMC Medical Informatics and Decision Making Biomedical data security and privacy Clinical notes de-identification Homomorphic encryption
title	A privacy-preserving distributed filtering framework for NLP artifacts
title_full	A privacy-preserving distributed filtering framework for NLP artifacts
title_fullStr	A privacy-preserving distributed filtering framework for NLP artifacts
title_full_unstemmed	A privacy-preserving distributed filtering framework for NLP artifacts
title_short	A privacy-preserving distributed filtering framework for NLP artifacts
title_sort	privacy preserving distributed filtering framework for nlp artifacts
topic	Biomedical data security and privacy Clinical notes de-identification Homomorphic encryption
url	http://link.springer.com/article/10.1186/s12911-019-0867-z
work_keys_str_mv	AT mdnazmussadat aprivacypreservingdistributedfilteringframeworkfornlpartifacts AT mdmominalaziz aprivacypreservingdistributedfilteringframeworkfornlpartifacts AT nomanmohammed aprivacypreservingdistributedfilteringframeworkfornlpartifacts AT sergueipakhomov aprivacypreservingdistributedfilteringframeworkfornlpartifacts AT hongfangliu aprivacypreservingdistributedfilteringframeworkfornlpartifacts AT xiaoqianjiang aprivacypreservingdistributedfilteringframeworkfornlpartifacts AT mdnazmussadat privacypreservingdistributedfilteringframeworkfornlpartifacts AT mdmominalaziz privacypreservingdistributedfilteringframeworkfornlpartifacts AT nomanmohammed privacypreservingdistributedfilteringframeworkfornlpartifacts AT sergueipakhomov privacypreservingdistributedfilteringframeworkfornlpartifacts AT hongfangliu privacypreservingdistributedfilteringframeworkfornlpartifacts AT xiaoqianjiang privacypreservingdistributedfilteringframeworkfornlpartifacts

A privacy-preserving distributed filtering framework for NLP artifacts

Similar Items