Exploring the performance and explainability of fine-tuned BERT models for neuroradiology protocol assignment

Abstract Background Deep learning has demonstrated significant advancements across various domains. However, its implementation in specialized areas, such as medical settings, remains approached with caution. In these high-stake environments, understanding the model's decision-making process is...

Full description

Bibliographic Details
Main Authors: Salmonn Talebi, Elizabeth Tong, Anna Li, Ghiam Yamin, Greg Zaharchuk, Mohammad R. K. Mofrad
Format: Article
Language:English
Published: BMC 2024-02-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-024-02444-z
_version_ 1797274404225286144
author Salmonn Talebi
Elizabeth Tong
Anna Li
Ghiam Yamin
Greg Zaharchuk
Mohammad R. K. Mofrad
author_facet Salmonn Talebi
Elizabeth Tong
Anna Li
Ghiam Yamin
Greg Zaharchuk
Mohammad R. K. Mofrad
author_sort Salmonn Talebi
collection DOAJ
description Abstract Background Deep learning has demonstrated significant advancements across various domains. However, its implementation in specialized areas, such as medical settings, remains approached with caution. In these high-stake environments, understanding the model's decision-making process is critical. This study assesses the performance of different pretrained Bidirectional Encoder Representations from Transformers (BERT) models and delves into understanding its decision-making within the context of medical image protocol assignment. Methods Four different pre-trained BERT models (BERT, BioBERT, ClinicalBERT, RoBERTa) were fine-tuned for the medical image protocol classification task. Word importance was measured by attributing the classification output to every word using a gradient-based method. Subsequently, a trained radiologist reviewed the resulting word importance scores to assess the model’s decision-making process relative to human reasoning. Results The BERT model came close to human performance on our test set. The BERT model successfully identified relevant words indicative of the target protocol. Analysis of important words in misclassifications revealed potential systematic errors in the model. Conclusions The BERT model shows promise in medical image protocol assignment by reaching near human level performance and identifying key words effectively. The detection of systematic errors paves the way for further refinements to enhance its safety and utility in clinical settings.
first_indexed 2024-03-07T14:57:51Z
format Article
id doaj.art-d70101377721430091fe589808137498
institution Directory Open Access Journal
issn 1472-6947
language English
last_indexed 2024-03-07T14:57:51Z
publishDate 2024-02-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj.art-d70101377721430091fe5898081374982024-03-05T19:19:39ZengBMCBMC Medical Informatics and Decision Making1472-69472024-02-0124111210.1186/s12911-024-02444-zExploring the performance and explainability of fine-tuned BERT models for neuroradiology protocol assignmentSalmonn Talebi0Elizabeth Tong1Anna Li2Ghiam Yamin3Greg Zaharchuk4Mohammad R. K. Mofrad5University of CaliforniaStanford UniversityStanford UniversityStanford UniversityStanford UniversityUniversity of CaliforniaAbstract Background Deep learning has demonstrated significant advancements across various domains. However, its implementation in specialized areas, such as medical settings, remains approached with caution. In these high-stake environments, understanding the model's decision-making process is critical. This study assesses the performance of different pretrained Bidirectional Encoder Representations from Transformers (BERT) models and delves into understanding its decision-making within the context of medical image protocol assignment. Methods Four different pre-trained BERT models (BERT, BioBERT, ClinicalBERT, RoBERTa) were fine-tuned for the medical image protocol classification task. Word importance was measured by attributing the classification output to every word using a gradient-based method. Subsequently, a trained radiologist reviewed the resulting word importance scores to assess the model’s decision-making process relative to human reasoning. Results The BERT model came close to human performance on our test set. The BERT model successfully identified relevant words indicative of the target protocol. Analysis of important words in misclassifications revealed potential systematic errors in the model. Conclusions The BERT model shows promise in medical image protocol assignment by reaching near human level performance and identifying key words effectively. The detection of systematic errors paves the way for further refinements to enhance its safety and utility in clinical settings.https://doi.org/10.1186/s12911-024-02444-zHealthcareMachine learningInterpretabilityExplanationsBERT
spellingShingle Salmonn Talebi
Elizabeth Tong
Anna Li
Ghiam Yamin
Greg Zaharchuk
Mohammad R. K. Mofrad
Exploring the performance and explainability of fine-tuned BERT models for neuroradiology protocol assignment
BMC Medical Informatics and Decision Making
Healthcare
Machine learning
Interpretability
Explanations
BERT
title Exploring the performance and explainability of fine-tuned BERT models for neuroradiology protocol assignment
title_full Exploring the performance and explainability of fine-tuned BERT models for neuroradiology protocol assignment
title_fullStr Exploring the performance and explainability of fine-tuned BERT models for neuroradiology protocol assignment
title_full_unstemmed Exploring the performance and explainability of fine-tuned BERT models for neuroradiology protocol assignment
title_short Exploring the performance and explainability of fine-tuned BERT models for neuroradiology protocol assignment
title_sort exploring the performance and explainability of fine tuned bert models for neuroradiology protocol assignment
topic Healthcare
Machine learning
Interpretability
Explanations
BERT
url https://doi.org/10.1186/s12911-024-02444-z
work_keys_str_mv AT salmonntalebi exploringtheperformanceandexplainabilityoffinetunedbertmodelsforneuroradiologyprotocolassignment
AT elizabethtong exploringtheperformanceandexplainabilityoffinetunedbertmodelsforneuroradiologyprotocolassignment
AT annali exploringtheperformanceandexplainabilityoffinetunedbertmodelsforneuroradiologyprotocolassignment
AT ghiamyamin exploringtheperformanceandexplainabilityoffinetunedbertmodelsforneuroradiologyprotocolassignment
AT gregzaharchuk exploringtheperformanceandexplainabilityoffinetunedbertmodelsforneuroradiologyprotocolassignment
AT mohammadrkmofrad exploringtheperformanceandexplainabilityoffinetunedbertmodelsforneuroradiologyprotocolassignment