Identifying Risk Factors Associated With Lower Back Pain in Electronic Medical Record Free Text: Deep Learning Approach Using Clinical Note Annotations

Abstract BackgroundLower back pain is a common weakening condition that affects a large population. It is a leading cause of disability and lost productivity, and the associated medical costs and lost wages place a substantial burden on individuals and society. Recent advances in artificial intelli...

Full description

Bibliographic Details
Main Authors: Aman Jaiswal, Alan Katz, Marcello Nesca, Evangelos Milios
Format: Article
Language:English
Published: JMIR Publications 2023-08-01
Series:JMIR Medical Informatics
Online Access:https://doi.org/10.2196/45105
_version_ 1797742321460051968
author Aman Jaiswal
Alan Katz
Marcello Nesca
Evangelos Milios
author_facet Aman Jaiswal
Alan Katz
Marcello Nesca
Evangelos Milios
author_sort Aman Jaiswal
collection DOAJ
description Abstract BackgroundLower back pain is a common weakening condition that affects a large population. It is a leading cause of disability and lost productivity, and the associated medical costs and lost wages place a substantial burden on individuals and society. Recent advances in artificial intelligence and natural language processing have opened new opportunities for the identification and management of risk factors for lower back pain. In this paper, we propose and train a deep learning model on a data set of clinical notes that have been annotated with relevant risk factors, and we evaluate the model’s performance in identifying risk factors in new clinical notes. ObjectiveThe primary objective is to develop a novel deep learning approach to detect risk factors for underlying disease in patients presenting with lower back pain in clinical encounter notes. The secondary objective is to propose solutions to potential challenges of using deep learning and natural language processing techniques for identifying risk factors in electronic medical record free text and make practical recommendations for future research in this area. MethodsWe manually annotated clinical notes for the presence of six risk factors for severe underlying disease in patients presenting with lower back pain. Data were highly imbalanced, with only 12% (n=296) of the annotated notes having at least one risk factor. To address imbalanced data, a combination of semantic textual similarity and regular expressions was used to further capture notes for annotation. Further analysis was conducted to study the impact of downsampling, binary formulation of multi-label classification, and unsupervised pretraining on classification performance. ResultsOf 2749 labeled clinical notes, 347 exhibited at least one risk factor, while 2402 exhibited none. The initial analysis shows that downsampling the training set to equalize the ratio of clinical notes with and without risk factors improved the macro–area under the receiver operating characteristic curve (AUROC) by 2%. The Bidirectional Encoder Representations from Transformers (BERT) model improved the macro-AUROC by 15% over the traditional machine learning baseline. In experiment 2, the proposed BERT–convolutional neural network (CNN) model for longer texts improved (4% macro-AUROC) over the BERT baseline, and the multitask models are more stable for minority classes. In experiment 3, domain adaptation of BERTCNN using masked language modeling improved the macro-AUROC by 2%. ConclusionsPrimary care clinical notes are likely to require manipulation to perform meaningful free-text analysis. The application of BERT models for multi-label classification on downsampled annotated clinical notes is useful in detecting risk factors suggesting an indication for imaging for patients with lower back pain.
first_indexed 2024-03-12T14:39:10Z
format Article
id doaj.art-6f30519d46154b4f86dac04e5e930680
institution Directory Open Access Journal
issn 2291-9694
language English
last_indexed 2024-03-12T14:39:10Z
publishDate 2023-08-01
publisher JMIR Publications
record_format Article
series JMIR Medical Informatics
spelling doaj.art-6f30519d46154b4f86dac04e5e9306802023-08-16T12:54:27ZengJMIR PublicationsJMIR Medical Informatics2291-96942023-08-0111e45105e4510510.2196/45105Identifying Risk Factors Associated With Lower Back Pain in Electronic Medical Record Free Text: Deep Learning Approach Using Clinical Note AnnotationsAman Jaiswal0http://orcid.org/0000-0002-4125-3691Alan Katz1http://orcid.org/0000-0001-8280-7024Marcello Nesca2http://orcid.org/0000-0003-4938-6939Evangelos Milios3http://orcid.org/0000-0001-5549-4675Dalhousie UniversityDepartment of Community Health Sciences, University of ManitobaDepartment of Community Health Sciences, University of ManitobaDalhousie University Abstract BackgroundLower back pain is a common weakening condition that affects a large population. It is a leading cause of disability and lost productivity, and the associated medical costs and lost wages place a substantial burden on individuals and society. Recent advances in artificial intelligence and natural language processing have opened new opportunities for the identification and management of risk factors for lower back pain. In this paper, we propose and train a deep learning model on a data set of clinical notes that have been annotated with relevant risk factors, and we evaluate the model’s performance in identifying risk factors in new clinical notes. ObjectiveThe primary objective is to develop a novel deep learning approach to detect risk factors for underlying disease in patients presenting with lower back pain in clinical encounter notes. The secondary objective is to propose solutions to potential challenges of using deep learning and natural language processing techniques for identifying risk factors in electronic medical record free text and make practical recommendations for future research in this area. MethodsWe manually annotated clinical notes for the presence of six risk factors for severe underlying disease in patients presenting with lower back pain. Data were highly imbalanced, with only 12% (n=296) of the annotated notes having at least one risk factor. To address imbalanced data, a combination of semantic textual similarity and regular expressions was used to further capture notes for annotation. Further analysis was conducted to study the impact of downsampling, binary formulation of multi-label classification, and unsupervised pretraining on classification performance. ResultsOf 2749 labeled clinical notes, 347 exhibited at least one risk factor, while 2402 exhibited none. The initial analysis shows that downsampling the training set to equalize the ratio of clinical notes with and without risk factors improved the macro–area under the receiver operating characteristic curve (AUROC) by 2%. The Bidirectional Encoder Representations from Transformers (BERT) model improved the macro-AUROC by 15% over the traditional machine learning baseline. In experiment 2, the proposed BERT–convolutional neural network (CNN) model for longer texts improved (4% macro-AUROC) over the BERT baseline, and the multitask models are more stable for minority classes. In experiment 3, domain adaptation of BERTCNN using masked language modeling improved the macro-AUROC by 2%. ConclusionsPrimary care clinical notes are likely to require manipulation to perform meaningful free-text analysis. The application of BERT models for multi-label classification on downsampled annotated clinical notes is useful in detecting risk factors suggesting an indication for imaging for patients with lower back pain.https://doi.org/10.2196/45105
spellingShingle Aman Jaiswal
Alan Katz
Marcello Nesca
Evangelos Milios
Identifying Risk Factors Associated With Lower Back Pain in Electronic Medical Record Free Text: Deep Learning Approach Using Clinical Note Annotations
JMIR Medical Informatics
title Identifying Risk Factors Associated With Lower Back Pain in Electronic Medical Record Free Text: Deep Learning Approach Using Clinical Note Annotations
title_full Identifying Risk Factors Associated With Lower Back Pain in Electronic Medical Record Free Text: Deep Learning Approach Using Clinical Note Annotations
title_fullStr Identifying Risk Factors Associated With Lower Back Pain in Electronic Medical Record Free Text: Deep Learning Approach Using Clinical Note Annotations
title_full_unstemmed Identifying Risk Factors Associated With Lower Back Pain in Electronic Medical Record Free Text: Deep Learning Approach Using Clinical Note Annotations
title_short Identifying Risk Factors Associated With Lower Back Pain in Electronic Medical Record Free Text: Deep Learning Approach Using Clinical Note Annotations
title_sort identifying risk factors associated with lower back pain in electronic medical record free text deep learning approach using clinical note annotations
url https://doi.org/10.2196/45105
work_keys_str_mv AT amanjaiswal identifyingriskfactorsassociatedwithlowerbackpaininelectronicmedicalrecordfreetextdeeplearningapproachusingclinicalnoteannotations
AT alankatz identifyingriskfactorsassociatedwithlowerbackpaininelectronicmedicalrecordfreetextdeeplearningapproachusingclinicalnoteannotations
AT marcellonesca identifyingriskfactorsassociatedwithlowerbackpaininelectronicmedicalrecordfreetextdeeplearningapproachusingclinicalnoteannotations
AT evangelosmilios identifyingriskfactorsassociatedwithlowerbackpaininelectronicmedicalrecordfreetextdeeplearningapproachusingclinicalnoteannotations