ChatGPT in Iranian medical licensing examination: evaluating the diagnostic accuracy and decision-making capabilities of an AI-based model

Introduction Large language models such as ChatGPT have gained popularity for their ability to generate comprehensive responses to human queries. In the field of medicine, ChatGPT has shown promise in applications ranging from diagnostics to decision-making. However, its performance in medical exami...

Full description

Bibliographic Details
Main Authors: Manoochehr Ebrahimian, Behdad Behnam, Negin Ghayebi, Elham Sobhrakhshankhah
Format: Article
Language:English
Published: BMJ Publishing Group 2023-06-01
Series:BMJ Health & Care Informatics
Online Access:https://informatics.bmj.com/content/30/1/e100815.full
_version_ 1827152976129556480
author Manoochehr Ebrahimian
Behdad Behnam
Negin Ghayebi
Elham Sobhrakhshankhah
author_facet Manoochehr Ebrahimian
Behdad Behnam
Negin Ghayebi
Elham Sobhrakhshankhah
author_sort Manoochehr Ebrahimian
collection DOAJ
description Introduction Large language models such as ChatGPT have gained popularity for their ability to generate comprehensive responses to human queries. In the field of medicine, ChatGPT has shown promise in applications ranging from diagnostics to decision-making. However, its performance in medical examinations and its comparison to random guessing have not been extensively studied.Methods This study aimed to evaluate the performance of ChatGPT in the preinternship examination, a comprehensive medical assessment for students in Iran. The examination consisted of 200 multiple-choice questions categorised into basic science evaluation, diagnosis and decision-making. GPT-4 was used, and the questions were translated to English. A statistical analysis was conducted to assess the performance of ChatGPT and also compare it with a random test group.Results The results showed that ChatGPT performed exceptionally well, with 68.5% of the questions answered correctly, significantly surpassing the pass mark of 45%. It exhibited superior performance in decision-making and successfully passed all specialties. Comparing ChatGPT to the random test group, ChatGPT’s performance was significantly higher, demonstrating its ability to provide more accurate responses and reasoning.Conclusion This study highlights the potential of ChatGPT in medical licensing examinations and its advantage over random guessing. However, it is important to note that ChatGPT still falls short of human physicians in terms of diagnostic accuracy and decision-making capabilities. Caution should be exercised when using ChatGPT, and its results should be verified by human experts to ensure patient safety and avoid potential errors in the medical field.
first_indexed 2024-03-08T16:38:38Z
format Article
id doaj.art-d28cca36d6f74f56abc9ba8525e04613
institution Directory Open Access Journal
issn 2632-1009
language English
last_indexed 2025-03-20T22:14:25Z
publishDate 2023-06-01
publisher BMJ Publishing Group
record_format Article
series BMJ Health & Care Informatics
spelling doaj.art-d28cca36d6f74f56abc9ba8525e046132024-08-09T04:45:10ZengBMJ Publishing GroupBMJ Health & Care Informatics2632-10092023-06-0130110.1136/bmjhci-2023-100815ChatGPT in Iranian medical licensing examination: evaluating the diagnostic accuracy and decision-making capabilities of an AI-based modelManoochehr Ebrahimian0Behdad Behnam1Negin Ghayebi2Elham Sobhrakhshankhah3Pediatric Surgery Research Center, Research Institute for Children`s Health, Shahid Beheshti University of Medical Sciences, Tehran, IranGastrointestinal and Liver Disease Research Center, Iran University of Medical Sciences, Tehran, IranSchool of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, IranGastrointestinal and Liver Disease Research Center, Iran University of Medical Sciences, Tehran, IranIntroduction Large language models such as ChatGPT have gained popularity for their ability to generate comprehensive responses to human queries. In the field of medicine, ChatGPT has shown promise in applications ranging from diagnostics to decision-making. However, its performance in medical examinations and its comparison to random guessing have not been extensively studied.Methods This study aimed to evaluate the performance of ChatGPT in the preinternship examination, a comprehensive medical assessment for students in Iran. The examination consisted of 200 multiple-choice questions categorised into basic science evaluation, diagnosis and decision-making. GPT-4 was used, and the questions were translated to English. A statistical analysis was conducted to assess the performance of ChatGPT and also compare it with a random test group.Results The results showed that ChatGPT performed exceptionally well, with 68.5% of the questions answered correctly, significantly surpassing the pass mark of 45%. It exhibited superior performance in decision-making and successfully passed all specialties. Comparing ChatGPT to the random test group, ChatGPT’s performance was significantly higher, demonstrating its ability to provide more accurate responses and reasoning.Conclusion This study highlights the potential of ChatGPT in medical licensing examinations and its advantage over random guessing. However, it is important to note that ChatGPT still falls short of human physicians in terms of diagnostic accuracy and decision-making capabilities. Caution should be exercised when using ChatGPT, and its results should be verified by human experts to ensure patient safety and avoid potential errors in the medical field.https://informatics.bmj.com/content/30/1/e100815.full
spellingShingle Manoochehr Ebrahimian
Behdad Behnam
Negin Ghayebi
Elham Sobhrakhshankhah
ChatGPT in Iranian medical licensing examination: evaluating the diagnostic accuracy and decision-making capabilities of an AI-based model
BMJ Health & Care Informatics
title ChatGPT in Iranian medical licensing examination: evaluating the diagnostic accuracy and decision-making capabilities of an AI-based model
title_full ChatGPT in Iranian medical licensing examination: evaluating the diagnostic accuracy and decision-making capabilities of an AI-based model
title_fullStr ChatGPT in Iranian medical licensing examination: evaluating the diagnostic accuracy and decision-making capabilities of an AI-based model
title_full_unstemmed ChatGPT in Iranian medical licensing examination: evaluating the diagnostic accuracy and decision-making capabilities of an AI-based model
title_short ChatGPT in Iranian medical licensing examination: evaluating the diagnostic accuracy and decision-making capabilities of an AI-based model
title_sort chatgpt in iranian medical licensing examination evaluating the diagnostic accuracy and decision making capabilities of an ai based model
url https://informatics.bmj.com/content/30/1/e100815.full
work_keys_str_mv AT manoochehrebrahimian chatgptiniranianmedicallicensingexaminationevaluatingthediagnosticaccuracyanddecisionmakingcapabilitiesofanaibasedmodel
AT behdadbehnam chatgptiniranianmedicallicensingexaminationevaluatingthediagnosticaccuracyanddecisionmakingcapabilitiesofanaibasedmodel
AT neginghayebi chatgptiniranianmedicallicensingexaminationevaluatingthediagnosticaccuracyanddecisionmakingcapabilitiesofanaibasedmodel
AT elhamsobhrakhshankhah chatgptiniranianmedicallicensingexaminationevaluatingthediagnosticaccuracyanddecisionmakingcapabilitiesofanaibasedmodel