Use of noisy labels as weak learners to identify incompletely ascertainable outcomes: A Feasibility study with opioid-induced respiratory depression

Objective: Assigning outcome labels to large observational data sets in a timely and accurate manner, particularly when outcomes are rare or not directly ascertainable, remains a significant challenge within biomedical informatics. We examined whether noisy labels generated from subject matter exper...

Full description

Bibliographic Details
Main Authors:	Alvin D. Jeffery, Daniel Fabbri, Ruth M. Reeves, Michael E. Matheny
Format:	Article
Language:	English
Published:	Elsevier 2024-03-01
Series:	Heliyon
Subjects:	Medical informatics Machine learning Electronic health records Phenotype Classification
Online Access:	http://www.sciencedirect.com/science/article/pii/S2405844024024654

_version_	1797259905961295872
author	Alvin D. Jeffery Daniel Fabbri Ruth M. Reeves Michael E. Matheny
author_facet	Alvin D. Jeffery Daniel Fabbri Ruth M. Reeves Michael E. Matheny
author_sort	Alvin D. Jeffery
collection	DOAJ
description	Objective: Assigning outcome labels to large observational data sets in a timely and accurate manner, particularly when outcomes are rare or not directly ascertainable, remains a significant challenge within biomedical informatics. We examined whether noisy labels generated from subject matter experts’ heuristics using heterogenous data types within a data programming paradigm could provide outcomes labels to a large, observational data set. We chose the clinical condition of opioid-induced respiratory depression for our use case because it is rare, has no administrative codes to easily identify the condition, and typically requires at least some unstructured text to ascertain its presence. Materials and methods: Using de-identified electronic health records of 52,861 post-operative encounters, we applied a data programming paradigm (implemented in the Snorkel software) for the development of a machine learning classifier for opioid-induced respiratory depression. Our approach included subject matter experts creating 14 labeling functions that served as noisy labels for developing a probabilistic Generative model. We used probabilistic labels from the Generative model as outcome labels for training a Discriminative model on the source data. We evaluated performance of the Discriminative model with a hold-out test set of 599 independently-reviewed patient records. Results: The final Discriminative classification model achieved an accuracy of 0.977, an F1 score of 0.417, a sensitivity of 1.0, and an AUC of 0.988 in the hold-out test set with a prevalence of 0.83% (5/599). Discussion: All of the confirmed Cases were identified by the classifier. For rare outcomes, this finding is encouraging because it reduces the number of manual reviews needed by excluding visits/patients with low probabilities. Conclusion: Application of a data programming paradigm with expert-informed labeling functions might have utility for phenotyping clinical phenomena that are not easily ascertainable from highly-structured data.
first_indexed	2024-03-07T19:09:56Z
format	Article
id	doaj.art-ff4847acb53f40bcb54ead2a6e99981c
institution	Directory Open Access Journal
issn	2405-8440
language	English
last_indexed	2024-04-24T23:16:52Z
publishDate	2024-03-01
publisher	Elsevier
record_format	Article
series	Heliyon
spelling	doaj.art-ff4847acb53f40bcb54ead2a6e99981c2024-03-17T07:55:29ZengElsevierHeliyon2405-84402024-03-01105e26434Use of noisy labels as weak learners to identify incompletely ascertainable outcomes: A Feasibility study with opioid-induced respiratory depressionAlvin D. Jeffery0Daniel Fabbri1Ruth M. Reeves2Michael E. Matheny3Vanderbilt University School of Nursing, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA; Corresponding author. Vanderbilt University School of Nursing, Nashville, TN, USADepartment of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USADepartment of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USADepartment of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA; Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USAObjective: Assigning outcome labels to large observational data sets in a timely and accurate manner, particularly when outcomes are rare or not directly ascertainable, remains a significant challenge within biomedical informatics. We examined whether noisy labels generated from subject matter experts’ heuristics using heterogenous data types within a data programming paradigm could provide outcomes labels to a large, observational data set. We chose the clinical condition of opioid-induced respiratory depression for our use case because it is rare, has no administrative codes to easily identify the condition, and typically requires at least some unstructured text to ascertain its presence. Materials and methods: Using de-identified electronic health records of 52,861 post-operative encounters, we applied a data programming paradigm (implemented in the Snorkel software) for the development of a machine learning classifier for opioid-induced respiratory depression. Our approach included subject matter experts creating 14 labeling functions that served as noisy labels for developing a probabilistic Generative model. We used probabilistic labels from the Generative model as outcome labels for training a Discriminative model on the source data. We evaluated performance of the Discriminative model with a hold-out test set of 599 independently-reviewed patient records. Results: The final Discriminative classification model achieved an accuracy of 0.977, an F1 score of 0.417, a sensitivity of 1.0, and an AUC of 0.988 in the hold-out test set with a prevalence of 0.83% (5/599). Discussion: All of the confirmed Cases were identified by the classifier. For rare outcomes, this finding is encouraging because it reduces the number of manual reviews needed by excluding visits/patients with low probabilities. Conclusion: Application of a data programming paradigm with expert-informed labeling functions might have utility for phenotyping clinical phenomena that are not easily ascertainable from highly-structured data.http://www.sciencedirect.com/science/article/pii/S2405844024024654Medical informaticsMachine learningElectronic health recordsPhenotypeClassification
spellingShingle	Alvin D. Jeffery Daniel Fabbri Ruth M. Reeves Michael E. Matheny Use of noisy labels as weak learners to identify incompletely ascertainable outcomes: A Feasibility study with opioid-induced respiratory depression Heliyon Medical informatics Machine learning Electronic health records Phenotype Classification
title	Use of noisy labels as weak learners to identify incompletely ascertainable outcomes: A Feasibility study with opioid-induced respiratory depression
title_full	Use of noisy labels as weak learners to identify incompletely ascertainable outcomes: A Feasibility study with opioid-induced respiratory depression
title_fullStr	Use of noisy labels as weak learners to identify incompletely ascertainable outcomes: A Feasibility study with opioid-induced respiratory depression
title_full_unstemmed	Use of noisy labels as weak learners to identify incompletely ascertainable outcomes: A Feasibility study with opioid-induced respiratory depression
title_short	Use of noisy labels as weak learners to identify incompletely ascertainable outcomes: A Feasibility study with opioid-induced respiratory depression
title_sort	use of noisy labels as weak learners to identify incompletely ascertainable outcomes a feasibility study with opioid induced respiratory depression
topic	Medical informatics Machine learning Electronic health records Phenotype Classification
url	http://www.sciencedirect.com/science/article/pii/S2405844024024654
work_keys_str_mv	AT alvindjeffery useofnoisylabelsasweaklearnerstoidentifyincompletelyascertainableoutcomesafeasibilitystudywithopioidinducedrespiratorydepression AT danielfabbri useofnoisylabelsasweaklearnerstoidentifyincompletelyascertainableoutcomesafeasibilitystudywithopioidinducedrespiratorydepression AT ruthmreeves useofnoisylabelsasweaklearnerstoidentifyincompletelyascertainableoutcomesafeasibilitystudywithopioidinducedrespiratorydepression AT michaelematheny useofnoisylabelsasweaklearnerstoidentifyincompletelyascertainableoutcomesafeasibilitystudywithopioidinducedrespiratorydepression

Use of noisy labels as weak learners to identify incompletely ascertainable outcomes: A Feasibility study with opioid-induced respiratory depression

Similar Items