Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients

Abstract Background Automated de-identification methods for removing protected health information (PHI) from the source notes of the electronic health record (EHR) rely on building systems to recognize mentions of PHI in text, but they remain inadequate at ensuring perfect PHI removal. As an alterna...

Full description

Bibliographic Details
Main Authors:	Brihat Sharma, Dmitriy Dligach, Kristin Swope, Elizabeth Salisbury-Afshar, Niranjan S. Karnik, Cara Joyce, Majid Afshar
Format:	Article
Language:	English
Published:	BMC 2020-04-01
Series:	BMC Medical Informatics and Decision Making
Subjects:	Opioid misuse Heroin Opioid use disorder Natural language processing Machine learning Computable phenotype
Online Access:	http://link.springer.com/article/10.1186/s12911-020-1099-y

_version_	1818450921979379712
author	Brihat Sharma Dmitriy Dligach Kristin Swope Elizabeth Salisbury-Afshar Niranjan S. Karnik Cara Joyce Majid Afshar
author_facet	Brihat Sharma Dmitriy Dligach Kristin Swope Elizabeth Salisbury-Afshar Niranjan S. Karnik Cara Joyce Majid Afshar
author_sort	Brihat Sharma
collection	DOAJ
description	Abstract Background Automated de-identification methods for removing protected health information (PHI) from the source notes of the electronic health record (EHR) rely on building systems to recognize mentions of PHI in text, but they remain inadequate at ensuring perfect PHI removal. As an alternative to relying on de-identification systems, we propose the following solutions: (1) Mapping the corpus of documents to standardized medical vocabulary (concept unique identifier [CUI] codes mapped from the Unified Medical Language System) thus eliminating PHI as inputs to a machine learning model; and (2) training character-based machine learning models that obviate the need for a dictionary containing input words/n-grams. We aim to test the performance of models with and without PHI in a use-case for an opioid misuse classifier. Methods An observational cohort sampled from adult hospital inpatient encounters at a health system between 2007 and 2017. A case-control stratified sampling (n = 1000) was performed to build an annotated dataset for a reference standard of cases and non-cases of opioid misuse. Models for training and testing included CUI codes, character-based, and n-gram features. Models applied were machine learning with neural network and logistic regression as well as expert consensus with a rule-based model for opioid misuse. The area under the receiver operating characteristic curves (AUROC) were compared between models for discrimination. The Hosmer-Lemeshow test and visual plots measured model fit and calibration. Results Machine learning models with CUI codes performed similarly to n-gram models with PHI. The top performing models with AUROCs > 0.90 included CUI codes as inputs to a convolutional neural network, max pooling network, and logistic regression model. The top calibrated models with the best model fit were the CUI-based convolutional neural network and max pooling network. The top weighted CUI codes in logistic regression has the related terms ‘Heroin’ and ‘Victim of abuse’. Conclusions We demonstrate good test characteristics for an opioid misuse computable phenotype that is void of any PHI and performs similarly to models that use PHI. Herein we share a PHI-free, trained opioid misuse classifier for other researchers and health systems to use and benchmark to overcome privacy and security concerns.
first_indexed	2024-12-14T20:58:59Z
format	Article
id	doaj.art-9818817fb4af40c2a3192d093fa69557
institution	Directory Open Access Journal
issn	1472-6947
language	English
last_indexed	2024-12-14T20:58:59Z
publishDate	2020-04-01
publisher	BMC
record_format	Article
series	BMC Medical Informatics and Decision Making
spelling	doaj.art-9818817fb4af40c2a3192d093fa695572022-12-21T22:47:37ZengBMCBMC Medical Informatics and Decision Making1472-69472020-04-0120111110.1186/s12911-020-1099-yPublicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patientsBrihat Sharma0Dmitriy Dligach1Kristin Swope2Elizabeth Salisbury-Afshar3Niranjan S. Karnik4Cara Joyce5Majid Afshar6Department of Computer Science, Loyola University ChicagoDepartment of Computer Science, Loyola University ChicagoStritch School of Medicine, Loyola University ChicagoCenter for Multi-System Solutions to the Opioid Epidemic, American Institute for ResearchDepartment of Psychiatry, Rush University Medical CenterCenter for Health Outcomes and Informatics Research, Loyola University ChicagoCenter for Health Outcomes and Informatics Research, Loyola University ChicagoAbstract Background Automated de-identification methods for removing protected health information (PHI) from the source notes of the electronic health record (EHR) rely on building systems to recognize mentions of PHI in text, but they remain inadequate at ensuring perfect PHI removal. As an alternative to relying on de-identification systems, we propose the following solutions: (1) Mapping the corpus of documents to standardized medical vocabulary (concept unique identifier [CUI] codes mapped from the Unified Medical Language System) thus eliminating PHI as inputs to a machine learning model; and (2) training character-based machine learning models that obviate the need for a dictionary containing input words/n-grams. We aim to test the performance of models with and without PHI in a use-case for an opioid misuse classifier. Methods An observational cohort sampled from adult hospital inpatient encounters at a health system between 2007 and 2017. A case-control stratified sampling (n = 1000) was performed to build an annotated dataset for a reference standard of cases and non-cases of opioid misuse. Models for training and testing included CUI codes, character-based, and n-gram features. Models applied were machine learning with neural network and logistic regression as well as expert consensus with a rule-based model for opioid misuse. The area under the receiver operating characteristic curves (AUROC) were compared between models for discrimination. The Hosmer-Lemeshow test and visual plots measured model fit and calibration. Results Machine learning models with CUI codes performed similarly to n-gram models with PHI. The top performing models with AUROCs > 0.90 included CUI codes as inputs to a convolutional neural network, max pooling network, and logistic regression model. The top calibrated models with the best model fit were the CUI-based convolutional neural network and max pooling network. The top weighted CUI codes in logistic regression has the related terms ‘Heroin’ and ‘Victim of abuse’. Conclusions We demonstrate good test characteristics for an opioid misuse computable phenotype that is void of any PHI and performs similarly to models that use PHI. Herein we share a PHI-free, trained opioid misuse classifier for other researchers and health systems to use and benchmark to overcome privacy and security concerns.http://link.springer.com/article/10.1186/s12911-020-1099-yOpioid misuseHeroinOpioid use disorderNatural language processingMachine learningComputable phenotype
spellingShingle	Brihat Sharma Dmitriy Dligach Kristin Swope Elizabeth Salisbury-Afshar Niranjan S. Karnik Cara Joyce Majid Afshar Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients BMC Medical Informatics and Decision Making Opioid misuse Heroin Opioid use disorder Natural language processing Machine learning Computable phenotype
title	Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients
title_full	Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients
title_fullStr	Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients
title_full_unstemmed	Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients
title_short	Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients
title_sort	publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients
topic	Opioid misuse Heroin Opioid use disorder Natural language processing Machine learning Computable phenotype
url	http://link.springer.com/article/10.1186/s12911-020-1099-y
work_keys_str_mv	AT brihatsharma publiclyavailablemachinelearningmodelsforidentifyingopioidmisusefromtheclinicalnotesofhospitalizedpatients AT dmitriydligach publiclyavailablemachinelearningmodelsforidentifyingopioidmisusefromtheclinicalnotesofhospitalizedpatients AT kristinswope publiclyavailablemachinelearningmodelsforidentifyingopioidmisusefromtheclinicalnotesofhospitalizedpatients AT elizabethsalisburyafshar publiclyavailablemachinelearningmodelsforidentifyingopioidmisusefromtheclinicalnotesofhospitalizedpatients AT niranjanskarnik publiclyavailablemachinelearningmodelsforidentifyingopioidmisusefromtheclinicalnotesofhospitalizedpatients AT carajoyce publiclyavailablemachinelearningmodelsforidentifyingopioidmisusefromtheclinicalnotesofhospitalizedpatients AT majidafshar publiclyavailablemachinelearningmodelsforidentifyingopioidmisusefromtheclinicalnotesofhospitalizedpatients

Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients

Similar Items