Performance of ChatGPT, human radiologists, and context-aware ChatGPT in identifying AO codes from radiology reports

Abstract While radiologists can describe a fracture’s morphology and complexity with ease, the translation into classification systems such as the Arbeitsgemeinschaft Osteosynthesefragen (AO) Fracture and Dislocation Classification Compendium is more challenging. We tested the performance of generic...

Full description

Bibliographic Details
Main Authors: Maximilian F. Russe, Anna Fink, Helen Ngo, Hien Tran, Fabian Bamberg, Marco Reisert, Alexander Rau
Format: Article
Language:English
Published: Nature Portfolio 2023-08-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-023-41512-8
_version_ 1797576956222373888
author Maximilian F. Russe
Anna Fink
Helen Ngo
Hien Tran
Fabian Bamberg
Marco Reisert
Alexander Rau
author_facet Maximilian F. Russe
Anna Fink
Helen Ngo
Hien Tran
Fabian Bamberg
Marco Reisert
Alexander Rau
author_sort Maximilian F. Russe
collection DOAJ
description Abstract While radiologists can describe a fracture’s morphology and complexity with ease, the translation into classification systems such as the Arbeitsgemeinschaft Osteosynthesefragen (AO) Fracture and Dislocation Classification Compendium is more challenging. We tested the performance of generic chatbots and chatbots aware of specific knowledge of the AO classification provided by a vector-index and compared it to human readers. In the 100 radiological reports we created based on random AO codes, chatbots provided AO codes significantly faster than humans (mean 3.2 s per case vs. 50 s per case, p < .001) though not reaching human performance (max. chatbot performance of 86% correct full AO codes vs. 95% in human readers). In general, chatbots based on GPT 4 outperformed the ones based on GPT 3.5-Turbo. Further, we found that providing specific knowledge substantially enhances the chatbot’s performance and consistency as the context-aware chatbot based on GPT 4 provided 71% consistent correct full AO codes for the compared to the 2% consistent correct full AO codes for the generic ChatGPT 4. This provides evidence, that refining and providing specific context to ChatGPT will be the next essential step in harnessing its power.
first_indexed 2024-03-10T22:01:11Z
format Article
id doaj.art-2ef6b78e346448b7bd337087baf6d878
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-03-10T22:01:11Z
publishDate 2023-08-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-2ef6b78e346448b7bd337087baf6d8782023-11-19T12:55:55ZengNature PortfolioScientific Reports2045-23222023-08-011311610.1038/s41598-023-41512-8Performance of ChatGPT, human radiologists, and context-aware ChatGPT in identifying AO codes from radiology reportsMaximilian F. Russe0Anna Fink1Helen Ngo2Hien Tran3Fabian Bamberg4Marco Reisert5Alexander Rau6Department of Diagnostic and Interventional Radiology, Medical Center - University of Freiburg, Faculty of Medicine, University of FreiburgDepartment of Diagnostic and Interventional Radiology, Medical Center - University of Freiburg, Faculty of Medicine, University of FreiburgDepartment of Diagnostic and Interventional Radiology, Medical Center - University of Freiburg, Faculty of Medicine, University of FreiburgDepartment of Diagnostic and Interventional Radiology, Medical Center - University of Freiburg, Faculty of Medicine, University of FreiburgDepartment of Diagnostic and Interventional Radiology, Medical Center - University of Freiburg, Faculty of Medicine, University of FreiburgDepartment of Stereotactic and Functional Neurosurgery, Medical Center - University of Freiburg, Faculty of Medicine, University of FreiburgDepartment of Diagnostic and Interventional Radiology, Medical Center - University of Freiburg, Faculty of Medicine, University of FreiburgAbstract While radiologists can describe a fracture’s morphology and complexity with ease, the translation into classification systems such as the Arbeitsgemeinschaft Osteosynthesefragen (AO) Fracture and Dislocation Classification Compendium is more challenging. We tested the performance of generic chatbots and chatbots aware of specific knowledge of the AO classification provided by a vector-index and compared it to human readers. In the 100 radiological reports we created based on random AO codes, chatbots provided AO codes significantly faster than humans (mean 3.2 s per case vs. 50 s per case, p < .001) though not reaching human performance (max. chatbot performance of 86% correct full AO codes vs. 95% in human readers). In general, chatbots based on GPT 4 outperformed the ones based on GPT 3.5-Turbo. Further, we found that providing specific knowledge substantially enhances the chatbot’s performance and consistency as the context-aware chatbot based on GPT 4 provided 71% consistent correct full AO codes for the compared to the 2% consistent correct full AO codes for the generic ChatGPT 4. This provides evidence, that refining and providing specific context to ChatGPT will be the next essential step in harnessing its power.https://doi.org/10.1038/s41598-023-41512-8
spellingShingle Maximilian F. Russe
Anna Fink
Helen Ngo
Hien Tran
Fabian Bamberg
Marco Reisert
Alexander Rau
Performance of ChatGPT, human radiologists, and context-aware ChatGPT in identifying AO codes from radiology reports
Scientific Reports
title Performance of ChatGPT, human radiologists, and context-aware ChatGPT in identifying AO codes from radiology reports
title_full Performance of ChatGPT, human radiologists, and context-aware ChatGPT in identifying AO codes from radiology reports
title_fullStr Performance of ChatGPT, human radiologists, and context-aware ChatGPT in identifying AO codes from radiology reports
title_full_unstemmed Performance of ChatGPT, human radiologists, and context-aware ChatGPT in identifying AO codes from radiology reports
title_short Performance of ChatGPT, human radiologists, and context-aware ChatGPT in identifying AO codes from radiology reports
title_sort performance of chatgpt human radiologists and context aware chatgpt in identifying ao codes from radiology reports
url https://doi.org/10.1038/s41598-023-41512-8
work_keys_str_mv AT maximilianfrusse performanceofchatgpthumanradiologistsandcontextawarechatgptinidentifyingaocodesfromradiologyreports
AT annafink performanceofchatgpthumanradiologistsandcontextawarechatgptinidentifyingaocodesfromradiologyreports
AT helenngo performanceofchatgpthumanradiologistsandcontextawarechatgptinidentifyingaocodesfromradiologyreports
AT hientran performanceofchatgpthumanradiologistsandcontextawarechatgptinidentifyingaocodesfromradiologyreports
AT fabianbamberg performanceofchatgpthumanradiologistsandcontextawarechatgptinidentifyingaocodesfromradiologyreports
AT marcoreisert performanceofchatgpthumanradiologistsandcontextawarechatgptinidentifyingaocodesfromradiologyreports
AT alexanderrau performanceofchatgpthumanradiologistsandcontextawarechatgptinidentifyingaocodesfromradiologyreports