De-identification of patient notes with recurrent neural networks

Objective: Patient notes in electronic health records (EHRs) may contain critical information for medical investigations. However, the vast majority of medical investigators can only access de-identified notes, in order to protect the confidentiality of patients. In the United States, the Health Ins...

Full description

Bibliographic Details
Main Authors:	Dernoncourt, Franck, Lee, Ji Young, Uzuner, Ozlem, Szolovits, Peter
Other Authors:	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format:	Article
Language:	en_US
Published:	BMJ Publishing Group 2017
Online Access:	http://hdl.handle.net/1721.1/111064 https://orcid.org/0000-0002-1119-1346 https://orcid.org/0000-0001-6887-0924 https://orcid.org/0000-0001-8411-6403

_version_	1826209406608474112
author	Dernoncourt, Franck Lee, Ji Young Uzuner, Ozlem Szolovits, Peter
author2	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Dernoncourt, Franck Lee, Ji Young Uzuner, Ozlem Szolovits, Peter
author_sort	Dernoncourt, Franck
collection	MIT
description	Objective: Patient notes in electronic health records (EHRs) may contain critical information for medical investigations. However, the vast majority of medical investigators can only access de-identified notes, in order to protect the confidentiality of patients. In the United States, the Health Insurance Portability and Accountability Act (HIPAA) defines 18 types of protected health information that needs to be removed to de-identify patient notes. Manual de-identification is impractical given the size of electronic health record databases, the limited number of researchers with access to non-de-identified notes, and the frequent mistakes of human annotators. A reliable automated de-identification system would consequently be of high value. Materials and Methods: We introduce the first de-identification system based on artificial neural networks (ANNs), which requires no handcrafted features or rules, unlike existing systems. We compare the performance of the system with state-of-the-art systems on two datasets: the i2b2 2014 de-identification challenge dataset, which is the largest publicly available de-identification dataset, and the MIMIC de-identification dataset, which we assembled and is twice as large as the i2b2 2014 dataset. Results: Our ANN model outperforms the state-of-the-art systems. It yields an F1-score of 97.85 on the i2b2 2014 dataset, with a recall of 97.38 and a precision of 98.32, and an F1-score of 99.23 on the MIMIC de-identification dataset, with a recall of 99.25 and a precision of 99.21. Conclusion: Our findings support the use of ANNs for de-identification of patient notes, as they show better performance than previously published systems while requiring no manual feature engineering.
first_indexed	2024-09-23T14:22:01Z
format	Article
id	mit-1721.1/111064
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T14:22:01Z
publishDate	2017
publisher	BMJ Publishing Group
record_format	dspace
spelling	mit-1721.1/1110642022-09-29T09:00:51Z De-identification of patient notes with recurrent neural networks Dernoncourt, Franck Lee, Ji Young Uzuner, Ozlem Szolovits, Peter Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Dernoncourt, Franck Lee, Ji Young Szolovits, Peter Objective: Patient notes in electronic health records (EHRs) may contain critical information for medical investigations. However, the vast majority of medical investigators can only access de-identified notes, in order to protect the confidentiality of patients. In the United States, the Health Insurance Portability and Accountability Act (HIPAA) defines 18 types of protected health information that needs to be removed to de-identify patient notes. Manual de-identification is impractical given the size of electronic health record databases, the limited number of researchers with access to non-de-identified notes, and the frequent mistakes of human annotators. A reliable automated de-identification system would consequently be of high value. Materials and Methods: We introduce the first de-identification system based on artificial neural networks (ANNs), which requires no handcrafted features or rules, unlike existing systems. We compare the performance of the system with state-of-the-art systems on two datasets: the i2b2 2014 de-identification challenge dataset, which is the largest publicly available de-identification dataset, and the MIMIC de-identification dataset, which we assembled and is twice as large as the i2b2 2014 dataset. Results: Our ANN model outperforms the state-of-the-art systems. It yields an F1-score of 97.85 on the i2b2 2014 dataset, with a recall of 97.38 and a precision of 98.32, and an F1-score of 99.23 on the MIMIC de-identification dataset, with a recall of 99.25 and a precision of 99.21. Conclusion: Our findings support the use of ANNs for de-identification of patient notes, as they show better performance than previously published systems while requiring no manual feature engineering. 2017-08-29T19:37:29Z 2017-08-29T19:37:29Z 2016-12 2016-09 Article http://purl.org/eprint/type/JournalArticle 1067-5027 1527-974X http://hdl.handle.net/1721.1/111064 Dernoncourt, Franck et al. “De-Identification of Patient Notes with Recurrent Neural Networks.” Journal of the American Medical Informatics Association (December 2016): 596–606 © 2016 The Authors https://orcid.org/0000-0002-1119-1346 https://orcid.org/0000-0001-6887-0924 https://orcid.org/0000-0001-8411-6403 en_US http://dx.doi.org/10.1093/jamia/ocw156 Journal of the American Medical Informatics Association Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf BMJ Publishing Group arXiv
spellingShingle	Dernoncourt, Franck Lee, Ji Young Uzuner, Ozlem Szolovits, Peter De-identification of patient notes with recurrent neural networks
title	De-identification of patient notes with recurrent neural networks
title_full	De-identification of patient notes with recurrent neural networks
title_fullStr	De-identification of patient notes with recurrent neural networks
title_full_unstemmed	De-identification of patient notes with recurrent neural networks
title_short	De-identification of patient notes with recurrent neural networks
title_sort	de identification of patient notes with recurrent neural networks
url	http://hdl.handle.net/1721.1/111064 https://orcid.org/0000-0002-1119-1346 https://orcid.org/0000-0001-6887-0924 https://orcid.org/0000-0001-8411-6403
work_keys_str_mv	AT dernoncourtfranck deidentificationofpatientnoteswithrecurrentneuralnetworks AT leejiyoung deidentificationofpatientnoteswithrecurrentneuralnetworks AT uzunerozlem deidentificationofpatientnoteswithrecurrentneuralnetworks AT szolovitspeter deidentificationofpatientnoteswithrecurrentneuralnetworks

De-identification of patient notes with recurrent neural networks

Similar Items