Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports

Accurate, automated extraction of clinical stroke information from unstructured text has several important applications. ICD-9/10 codes can misclassify ischemic stroke events and do not distinguish acuity or location. Expeditious, accurate data extraction could provide considerable improvement in id...

Full description

Bibliographic Details
Main Authors:	Ong, Charlene Jennifer, Orfanoudaki, Agni, Zhang, Rebecca, Caprasse, Francois Pierre M., Bertsimas, Dimitris J
Other Authors:	Massachusetts Institute of Technology. Operations Research Center
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2021
Online Access:	https://hdl.handle.net/1721.1/130098

_version_	1826208901053284352
author	Ong, Charlene Jennifer Orfanoudaki, Agni Zhang, Rebecca Caprasse, Francois Pierre M. Bertsimas, Dimitris J
author2	Massachusetts Institute of Technology. Operations Research Center
author_facet	Massachusetts Institute of Technology. Operations Research Center Ong, Charlene Jennifer Orfanoudaki, Agni Zhang, Rebecca Caprasse, Francois Pierre M. Bertsimas, Dimitris J
author_sort	Ong, Charlene Jennifer
collection	MIT
description	Accurate, automated extraction of clinical stroke information from unstructured text has several important applications. ICD-9/10 codes can misclassify ischemic stroke events and do not distinguish acuity or location. Expeditious, accurate data extraction could provide considerable improvement in identifying stroke in large datasets, triaging critical clinical reports, and quality improvement efforts. In this study, we developed and report a comprehensive framework studying the performance of simple and complex stroke-specific Natural Language Processing (NLP) and Machine Learning (ML) methods to determine presence, location, and acuity of ischemic stroke from radiographic text. We collected 60,564 Computed Tomography and Magnetic Resonance Imaging Radiology reports from 17,864 patients from two large academic medical centers. We used standard techniques to featurize unstructured text and developed neurovascular specific word GloVe embeddings. We trained various binary classification algorithms to identify stroke presence, location, and acuity using 75% of 1,359 expert-labeled reports. We validated our methods internally on the remaining 25% of reports and externally on 500 radiology reports from an entirely separate academic institution. In our internal population, GloVe word embeddings paired with deep learning (Recurrent Neural Networks) had the best discrimination of all methods for our three tasks (AUCs of 0.96, 0.98, 0.93 respectively). Simpler NLP approaches (Bag of Words) performed best with interpretable algorithms (Logistic Regression) for identifying ischemic stroke (AUC of 0.95), MCA location (AUC 0.96), and acuity (AUC of 0.90). Similarly, GloVe and Recurrent Neural Networks (AUC 0.92, 0.89, 0.93) generalized better in our external test set than BOW and Logistic Regression for stroke presence, location and acuity, respectively (AUC 0.89, 0.86, 0.80). Our study demonstrates a comprehensive assessment of NLP techniques for unstructured radiographic text. Our findings are suggestive that NLP/ML methods can be used to discriminate stroke features from large data cohorts for both clinical and research-related investigations.
first_indexed	2024-09-23T14:14:06Z
format	Article
id	mit-1721.1/130098
institution	Massachusetts Institute of Technology
language	English
last_indexed	2024-09-23T14:14:06Z
publishDate	2021
publisher	Public Library of Science (PLoS)
record_format	dspace
spelling	mit-1721.1/1300982022-10-01T19:59:29Z Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports Ong, Charlene Jennifer Orfanoudaki, Agni Zhang, Rebecca Caprasse, Francois Pierre M. Bertsimas, Dimitris J Massachusetts Institute of Technology. Operations Research Center Sloan School of Management Accurate, automated extraction of clinical stroke information from unstructured text has several important applications. ICD-9/10 codes can misclassify ischemic stroke events and do not distinguish acuity or location. Expeditious, accurate data extraction could provide considerable improvement in identifying stroke in large datasets, triaging critical clinical reports, and quality improvement efforts. In this study, we developed and report a comprehensive framework studying the performance of simple and complex stroke-specific Natural Language Processing (NLP) and Machine Learning (ML) methods to determine presence, location, and acuity of ischemic stroke from radiographic text. We collected 60,564 Computed Tomography and Magnetic Resonance Imaging Radiology reports from 17,864 patients from two large academic medical centers. We used standard techniques to featurize unstructured text and developed neurovascular specific word GloVe embeddings. We trained various binary classification algorithms to identify stroke presence, location, and acuity using 75% of 1,359 expert-labeled reports. We validated our methods internally on the remaining 25% of reports and externally on 500 radiology reports from an entirely separate academic institution. In our internal population, GloVe word embeddings paired with deep learning (Recurrent Neural Networks) had the best discrimination of all methods for our three tasks (AUCs of 0.96, 0.98, 0.93 respectively). Simpler NLP approaches (Bag of Words) performed best with interpretable algorithms (Logistic Regression) for identifying ischemic stroke (AUC of 0.95), MCA location (AUC 0.96), and acuity (AUC of 0.90). Similarly, GloVe and Recurrent Neural Networks (AUC 0.92, 0.89, 0.93) generalized better in our external test set than BOW and Logistic Regression for stroke presence, location and acuity, respectively (AUC 0.89, 0.86, 0.80). Our study demonstrates a comprehensive assessment of NLP techniques for unstructured radiographic text. Our findings are suggestive that NLP/ML methods can be used to discriminate stroke features from large data cohorts for both clinical and research-related investigations. 2021-03-08T18:31:21Z 2021-03-08T18:31:21Z 2020-06 2019-11 2021-02-05T16:41:31Z Article http://purl.org/eprint/type/JournalArticle 1932-6203 https://hdl.handle.net/1721.1/130098 Ong, Charlene Jennifer et al. “Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports.” PLoS ONE, 15, 6 (June 2020): e0234908 © 2020 The Author(s) en 10.1371/journal.pone.0234908 PLoS ONE CC0 1.0 Universal (CC0 1.0) Public Domain Dedication https://creativecommons.org/publicdomain/zero/1.0/ application/pdf Public Library of Science (PLoS) PLoS
spellingShingle	Ong, Charlene Jennifer Orfanoudaki, Agni Zhang, Rebecca Caprasse, Francois Pierre M. Bertsimas, Dimitris J Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports
title	Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports
title_full	Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports
title_fullStr	Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports
title_full_unstemmed	Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports
title_short	Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports
title_sort	machine learning and natural language processing methods to identify ischemic stroke acuity and location from radiology reports
url	https://hdl.handle.net/1721.1/130098
work_keys_str_mv	AT ongcharlenejennifer machinelearningandnaturallanguageprocessingmethodstoidentifyischemicstrokeacuityandlocationfromradiologyreports AT orfanoudakiagni machinelearningandnaturallanguageprocessingmethodstoidentifyischemicstrokeacuityandlocationfromradiologyreports AT zhangrebecca machinelearningandnaturallanguageprocessingmethodstoidentifyischemicstrokeacuityandlocationfromradiologyreports AT caprassefrancoispierrem machinelearningandnaturallanguageprocessingmethodstoidentifyischemicstrokeacuityandlocationfromradiologyreports AT bertsimasdimitrisj machinelearningandnaturallanguageprocessingmethodstoidentifyischemicstrokeacuityandlocationfromradiologyreports

Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports

Similar Items