De-identification of Electronic Health Records Using Machine Learning Algorithms

Introduction: Electronic Health Record (EHR) contains valuable clinical information that can be useful for activities such as public health surveillance, quality improvement, and research. However, EHRs often contain identifiable health information that their presence limits the use of the records f...

Full description

Bibliographic Details
Main Authors:	Mostafa Langarizadeh, Azam Orooji
Format:	Article
Language:	fas
Published:	Kerman University of Medical Sciences 2017-09-01
Series:	مجله انفورماتیک سلامت و زیست پزشکی
Subjects:	confidentiality privacy de-identification machine learning
Online Access:	http://jhbmi.ir/article-1-211-en.html

_version_	1811176321690107904
author	Mostafa Langarizadeh Azam Orooji
author_facet	Mostafa Langarizadeh Azam Orooji
author_sort	Mostafa Langarizadeh
collection	DOAJ
description	Introduction: Electronic Health Record (EHR) contains valuable clinical information that can be useful for activities such as public health surveillance, quality improvement, and research. However, EHRs often contain identifiable health information that their presence limits the use of the records for sharing and secondary usages. De-identification is one of the common methods for protecting the confidentiality of patient information. This systematic review has focused on recently published studies on the usage of de-identification methods based on Machine Learning (ML) approaches for removing all identifiable information from electronic health records. Methods: A systematic review was performed in electronic databases like PubMed and ScienceDirect between 2006 and 2016. Studies were assessed for adherence to the CASP checklists and reviewed independently by two investigators. Finally, 12 articles were matched with inclusion criteria. Results: The selected studies have been discussed in terms of used methods and knowledge resources, types of identifiers detected, types of clinical documents, challenges and achieved results. The results showed that ML-based de-identification is a widely invoked approach to protect patient privacy when disclosing clinical data for secondary purposes, such as research. Also, the combination of the ML algorithms and some techniques such as pattern matching and regular expression matching could decrease need to train data. Conclusion: There is a lot of identifiable information in medical records. This study showed ML- based de-identification methods can intensively reduce the disclosure risk of information.
first_indexed	2024-04-10T19:49:50Z
format	Article
id	doaj.art-b13f557bee444aca97a13a0eb1c386d8
institution	Directory Open Access Journal
issn	2423-3870 2423-3498
language	fas
last_indexed	2024-04-10T19:49:50Z
publishDate	2017-09-01
publisher	Kerman University of Medical Sciences
record_format	Article
series	مجله انفورماتیک سلامت و زیست پزشکی
spelling	doaj.art-b13f557bee444aca97a13a0eb1c386d82023-01-28T10:42:01ZfasKerman University of Medical Sciencesمجله انفورماتیک سلامت و زیست پزشکی2423-38702423-34982017-09-0142154167De-identification of Electronic Health Records Using Machine Learning AlgorithmsMostafa Langarizadeh0Azam Orooji1 Ph.D Student of Medical Informatics, Health Information Management Dept., School of Health Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran Introduction: Electronic Health Record (EHR) contains valuable clinical information that can be useful for activities such as public health surveillance, quality improvement, and research. However, EHRs often contain identifiable health information that their presence limits the use of the records for sharing and secondary usages. De-identification is one of the common methods for protecting the confidentiality of patient information. This systematic review has focused on recently published studies on the usage of de-identification methods based on Machine Learning (ML) approaches for removing all identifiable information from electronic health records. Methods: A systematic review was performed in electronic databases like PubMed and ScienceDirect between 2006 and 2016. Studies were assessed for adherence to the CASP checklists and reviewed independently by two investigators. Finally, 12 articles were matched with inclusion criteria. Results: The selected studies have been discussed in terms of used methods and knowledge resources, types of identifiers detected, types of clinical documents, challenges and achieved results. The results showed that ML-based de-identification is a widely invoked approach to protect patient privacy when disclosing clinical data for secondary purposes, such as research. Also, the combination of the ML algorithms and some techniques such as pattern matching and regular expression matching could decrease need to train data. Conclusion: There is a lot of identifiable information in medical records. This study showed ML- based de-identification methods can intensively reduce the disclosure risk of information.http://jhbmi.ir/article-1-211-en.htmlconfidentialityprivacyde-identificationmachine learning
spellingShingle	Mostafa Langarizadeh Azam Orooji De-identification of Electronic Health Records Using Machine Learning Algorithms مجله انفورماتیک سلامت و زیست پزشکی confidentiality privacy de-identification machine learning
title	De-identification of Electronic Health Records Using Machine Learning Algorithms
title_full	De-identification of Electronic Health Records Using Machine Learning Algorithms
title_fullStr	De-identification of Electronic Health Records Using Machine Learning Algorithms
title_full_unstemmed	De-identification of Electronic Health Records Using Machine Learning Algorithms
title_short	De-identification of Electronic Health Records Using Machine Learning Algorithms
title_sort	de identification of electronic health records using machine learning algorithms
topic	confidentiality privacy de-identification machine learning
url	http://jhbmi.ir/article-1-211-en.html
work_keys_str_mv	AT mostafalangarizadeh deidentificationofelectronichealthrecordsusingmachinelearningalgorithms AT azamorooji deidentificationofelectronichealthrecordsusingmachinelearningalgorithms

De-identification of Electronic Health Records Using Machine Learning Algorithms

Similar Items