Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. In secondary analysis of electronic health records, a crucial ta...

Full description

Bibliographic Details
Main Authors: Gehrmann, Sebastian, Li, Yeran, Carlson, Eric T., Wu, Joy T., Welt, Jonathan, Foote, John, Moseley, Edward T., Grant, David W., Tyler, Patrick D., Dernoncourt, Franck, Celi, Leo Anthony G.
Other Authors: Massachusetts Institute of Technology. Institute for Medical Engineering & Science
Format: Article
Published: Public Library of Science 2018
Online Access:http://hdl.handle.net/1721.1/114939
https://orcid.org/0000-0002-1119-1346
_version_ 1811070392131911680
author Gehrmann, Sebastian
Li, Yeran
Carlson, Eric T.
Wu, Joy T.
Welt, Jonathan
Foote, John
Moseley, Edward T.
Grant, David W.
Tyler, Patrick D.
Dernoncourt, Franck
Celi, Leo Anthony G.
author2 Massachusetts Institute of Technology. Institute for Medical Engineering & Science
author_facet Massachusetts Institute of Technology. Institute for Medical Engineering & Science
Gehrmann, Sebastian
Li, Yeran
Carlson, Eric T.
Wu, Joy T.
Welt, Jonathan
Foote, John
Moseley, Edward T.
Grant, David W.
Tyler, Patrick D.
Dernoncourt, Franck
Celi, Leo Anthony G.
author_sort Gehrmann, Sebastian
collection MIT
description This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. In secondary analysis of electronic health records, a crucial task consists in correctly identifying the patient cohort under investigation. In many cases, the most valuable and relevant information for an accurate classification of medical conditions exist only in clinical narratives. Therefore, it is necessary to use natural language processing (NLP) techniques to extract and evaluate these narratives. The most commonly used approach to this problem relies on extracting a number of clinician-defined medical concepts from text and using machine learning techniques to identify whether a particular patient has a certain condition. However, recent advances in deep learning and NLP enable models to learn a rich representation of (medical) language. Convolutional neural networks (CNN) for text classification can augment the existing techniques by leveraging the representation of language to learn which phrases in a text are relevant for a given medical condition. In this work, we compare concept extraction based methods with CNNs and other commonly used models in NLP in ten phenotyping tasks using 1,610 discharge summaries from the MIMIC-III database. We show that CNNs outperform concept extraction based methods in almost all of the tasks, with an improvement in F1-score of up to 26 and up to 7 percentage points in area under the ROC curve (AUC). We additionally assess the interpretability of both approaches by presenting and evaluating methods that calculate and extract the most salient phrases for a prediction. The results indicate that CNNs are a valid alternative to existing approaches in patient phenotyping and cohort identification, and should be further investigated. Moreover, the deep learning approach presented in this paper can be used to assist clinicians during chart review or support the extraction of billing codes from text by identifying and highlighting relevant phrases for various medical conditions.
first_indexed 2024-09-23T08:35:14Z
format Article
id mit-1721.1/114939
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T08:35:14Z
publishDate 2018
publisher Public Library of Science
record_format dspace
spelling mit-1721.1/1149392022-09-30T09:47:37Z Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives Gehrmann, Sebastian Li, Yeran Carlson, Eric T. Wu, Joy T. Welt, Jonathan Foote, John Moseley, Edward T. Grant, David W. Tyler, Patrick D. Dernoncourt, Franck Celi, Leo Anthony G. Massachusetts Institute of Technology. Institute for Medical Engineering & Science Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science MIT Critical Data (Laboratory) Dernoncourt, Franck Celi, Leo Anthony G. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. In secondary analysis of electronic health records, a crucial task consists in correctly identifying the patient cohort under investigation. In many cases, the most valuable and relevant information for an accurate classification of medical conditions exist only in clinical narratives. Therefore, it is necessary to use natural language processing (NLP) techniques to extract and evaluate these narratives. The most commonly used approach to this problem relies on extracting a number of clinician-defined medical concepts from text and using machine learning techniques to identify whether a particular patient has a certain condition. However, recent advances in deep learning and NLP enable models to learn a rich representation of (medical) language. Convolutional neural networks (CNN) for text classification can augment the existing techniques by leveraging the representation of language to learn which phrases in a text are relevant for a given medical condition. In this work, we compare concept extraction based methods with CNNs and other commonly used models in NLP in ten phenotyping tasks using 1,610 discharge summaries from the MIMIC-III database. We show that CNNs outperform concept extraction based methods in almost all of the tasks, with an improvement in F1-score of up to 26 and up to 7 percentage points in area under the ROC curve (AUC). We additionally assess the interpretability of both approaches by presenting and evaluating methods that calculate and extract the most salient phrases for a prediction. The results indicate that CNNs are a valid alternative to existing approaches in patient phenotyping and cohort identification, and should be further investigated. Moreover, the deep learning approach presented in this paper can be used to assist clinicians during chart review or support the extraction of billing codes from text by identifying and highlighting relevant phrases for various medical conditions. National Institute of Biomedical Imaging and Bioengineering (U.S.) (Grant R01 EB017205-01A1) 2018-04-24T17:50:16Z 2018-04-24T17:50:16Z 2018-02 2017-06 2018-04-20T17:52:56Z Article http://purl.org/eprint/type/JournalArticle 1932-6203 http://hdl.handle.net/1721.1/114939 Gehrmann, Sebastian et al. “Comparing Deep Learning and Concept Extraction Based Methods for Patient Phenotyping from Clinical Narratives.” Edited by Jen-Hsiang Chuang. PLOS ONE 13, 2 (February 2018): e0192360 © 2018 Gehrmann et al https://orcid.org/0000-0002-1119-1346 http://dx.doi.org/10.1371/journal.pone.0192360 PLOS ONE Creative Commons Attribution 4.0 International License https://creativecommons.org/licenses/by/4.0/ application/pdf Public Library of Science PLoS
spellingShingle Gehrmann, Sebastian
Li, Yeran
Carlson, Eric T.
Wu, Joy T.
Welt, Jonathan
Foote, John
Moseley, Edward T.
Grant, David W.
Tyler, Patrick D.
Dernoncourt, Franck
Celi, Leo Anthony G.
Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives
title Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives
title_full Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives
title_fullStr Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives
title_full_unstemmed Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives
title_short Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives
title_sort comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives
url http://hdl.handle.net/1721.1/114939
https://orcid.org/0000-0002-1119-1346
work_keys_str_mv AT gehrmannsebastian comparingdeeplearningandconceptextractionbasedmethodsforpatientphenotypingfromclinicalnarratives
AT liyeran comparingdeeplearningandconceptextractionbasedmethodsforpatientphenotypingfromclinicalnarratives
AT carlsonerict comparingdeeplearningandconceptextractionbasedmethodsforpatientphenotypingfromclinicalnarratives
AT wujoyt comparingdeeplearningandconceptextractionbasedmethodsforpatientphenotypingfromclinicalnarratives
AT weltjonathan comparingdeeplearningandconceptextractionbasedmethodsforpatientphenotypingfromclinicalnarratives
AT footejohn comparingdeeplearningandconceptextractionbasedmethodsforpatientphenotypingfromclinicalnarratives
AT moseleyedwardt comparingdeeplearningandconceptextractionbasedmethodsforpatientphenotypingfromclinicalnarratives
AT grantdavidw comparingdeeplearningandconceptextractionbasedmethodsforpatientphenotypingfromclinicalnarratives
AT tylerpatrickd comparingdeeplearningandconceptextractionbasedmethodsforpatientphenotypingfromclinicalnarratives
AT dernoncourtfranck comparingdeeplearningandconceptextractionbasedmethodsforpatientphenotypingfromclinicalnarratives
AT celileoanthonyg comparingdeeplearningandconceptextractionbasedmethodsforpatientphenotypingfromclinicalnarratives