Exploring the performance and explainability of fine-tuned BERT models for neuroradiology protocol assignment

Abstract Background Deep learning has demonstrated significant advancements across various domains. However, its implementation in specialized areas, such as medical settings, remains approached with caution. In these high-stake environments, understanding the model's decision-making process is...

Full description

Bibliographic Details
Main Authors:	Salmonn Talebi, Elizabeth Tong, Anna Li, Ghiam Yamin, Greg Zaharchuk, Mohammad R. K. Mofrad
Format:	Article
Language:	English
Published:	BMC 2024-02-01
Series:	BMC Medical Informatics and Decision Making
Subjects:	Healthcare Machine learning Interpretability Explanations BERT
Online Access:	https://doi.org/10.1186/s12911-024-02444-z

_version_	1797274404225286144
author	Salmonn Talebi Elizabeth Tong Anna Li Ghiam Yamin Greg Zaharchuk Mohammad R. K. Mofrad
author_facet	Salmonn Talebi Elizabeth Tong Anna Li Ghiam Yamin Greg Zaharchuk Mohammad R. K. Mofrad
author_sort	Salmonn Talebi
collection	DOAJ
description	Abstract Background Deep learning has demonstrated significant advancements across various domains. However, its implementation in specialized areas, such as medical settings, remains approached with caution. In these high-stake environments, understanding the model's decision-making process is critical. This study assesses the performance of different pretrained Bidirectional Encoder Representations from Transformers (BERT) models and delves into understanding its decision-making within the context of medical image protocol assignment. Methods Four different pre-trained BERT models (BERT, BioBERT, ClinicalBERT, RoBERTa) were fine-tuned for the medical image protocol classification task. Word importance was measured by attributing the classification output to every word using a gradient-based method. Subsequently, a trained radiologist reviewed the resulting word importance scores to assess the model’s decision-making process relative to human reasoning. Results The BERT model came close to human performance on our test set. The BERT model successfully identified relevant words indicative of the target protocol. Analysis of important words in misclassifications revealed potential systematic errors in the model. Conclusions The BERT model shows promise in medical image protocol assignment by reaching near human level performance and identifying key words effectively. The detection of systematic errors paves the way for further refinements to enhance its safety and utility in clinical settings.
first_indexed	2024-03-07T14:57:51Z
format	Article
id	doaj.art-d70101377721430091fe589808137498
institution	Directory Open Access Journal
issn	1472-6947
language	English
last_indexed	2024-03-07T14:57:51Z
publishDate	2024-02-01
publisher	BMC
record_format	Article
series	BMC Medical Informatics and Decision Making
spelling	doaj.art-d70101377721430091fe5898081374982024-03-05T19:19:39ZengBMCBMC Medical Informatics and Decision Making1472-69472024-02-0124111210.1186/s12911-024-02444-zExploring the performance and explainability of fine-tuned BERT models for neuroradiology protocol assignmentSalmonn Talebi0Elizabeth Tong1Anna Li2Ghiam Yamin3Greg Zaharchuk4Mohammad R. K. Mofrad5University of CaliforniaStanford UniversityStanford UniversityStanford UniversityStanford UniversityUniversity of CaliforniaAbstract Background Deep learning has demonstrated significant advancements across various domains. However, its implementation in specialized areas, such as medical settings, remains approached with caution. In these high-stake environments, understanding the model's decision-making process is critical. This study assesses the performance of different pretrained Bidirectional Encoder Representations from Transformers (BERT) models and delves into understanding its decision-making within the context of medical image protocol assignment. Methods Four different pre-trained BERT models (BERT, BioBERT, ClinicalBERT, RoBERTa) were fine-tuned for the medical image protocol classification task. Word importance was measured by attributing the classification output to every word using a gradient-based method. Subsequently, a trained radiologist reviewed the resulting word importance scores to assess the model’s decision-making process relative to human reasoning. Results The BERT model came close to human performance on our test set. The BERT model successfully identified relevant words indicative of the target protocol. Analysis of important words in misclassifications revealed potential systematic errors in the model. Conclusions The BERT model shows promise in medical image protocol assignment by reaching near human level performance and identifying key words effectively. The detection of systematic errors paves the way for further refinements to enhance its safety and utility in clinical settings.https://doi.org/10.1186/s12911-024-02444-zHealthcareMachine learningInterpretabilityExplanationsBERT
spellingShingle	Salmonn Talebi Elizabeth Tong Anna Li Ghiam Yamin Greg Zaharchuk Mohammad R. K. Mofrad Exploring the performance and explainability of fine-tuned BERT models for neuroradiology protocol assignment BMC Medical Informatics and Decision Making Healthcare Machine learning Interpretability Explanations BERT
title	Exploring the performance and explainability of fine-tuned BERT models for neuroradiology protocol assignment
title_full	Exploring the performance and explainability of fine-tuned BERT models for neuroradiology protocol assignment
title_fullStr	Exploring the performance and explainability of fine-tuned BERT models for neuroradiology protocol assignment
title_full_unstemmed	Exploring the performance and explainability of fine-tuned BERT models for neuroradiology protocol assignment
title_short	Exploring the performance and explainability of fine-tuned BERT models for neuroradiology protocol assignment
title_sort	exploring the performance and explainability of fine tuned bert models for neuroradiology protocol assignment
topic	Healthcare Machine learning Interpretability Explanations BERT
url	https://doi.org/10.1186/s12911-024-02444-z
work_keys_str_mv	AT salmonntalebi exploringtheperformanceandexplainabilityoffinetunedbertmodelsforneuroradiologyprotocolassignment AT elizabethtong exploringtheperformanceandexplainabilityoffinetunedbertmodelsforneuroradiologyprotocolassignment AT annali exploringtheperformanceandexplainabilityoffinetunedbertmodelsforneuroradiologyprotocolassignment AT ghiamyamin exploringtheperformanceandexplainabilityoffinetunedbertmodelsforneuroradiologyprotocolassignment AT gregzaharchuk exploringtheperformanceandexplainabilityoffinetunedbertmodelsforneuroradiologyprotocolassignment AT mohammadrkmofrad exploringtheperformanceandexplainabilityoffinetunedbertmodelsforneuroradiologyprotocolassignment

Exploring the performance and explainability of fine-tuned BERT models for neuroradiology protocol assignment

Similar Items