Evaluating the limits of AI in medical specialisation: ChatGPT’s performance on the UK Neurology Specialty Certificate Examination

Background Large language models such as ChatGPT have demonstrated potential as innovative tools for medical education and practice, with studies showing their ability to perform at or near the passing threshold in general medical examinations and standardised admission tests. However, no studies ha...

Full description

Bibliographic Details
Main Author: Panagiotis Giannos
Format: Article
Language:English
Published: BMJ Publishing Group 2023-06-01
Series:BMJ Neurology Open
Online Access:https://neurologyopen.bmj.com/content/5/1/e000451.full
_version_ 1797741124236869632
author Panagiotis Giannos
author_facet Panagiotis Giannos
author_sort Panagiotis Giannos
collection DOAJ
description Background Large language models such as ChatGPT have demonstrated potential as innovative tools for medical education and practice, with studies showing their ability to perform at or near the passing threshold in general medical examinations and standardised admission tests. However, no studies have assessed their performance in the UK medical education context, particularly at a specialty level, and specifically in the field of neurology and neuroscience.Methods We evaluated the performance of ChatGPT in higher specialty training for neurology and neuroscience using 69 questions from the Pool—Specialty Certificate Examination (SCE) Neurology Web Questions bank. The dataset primarily focused on neurology (80%). The questions spanned subtopics such as symptoms and signs, diagnosis, interpretation and management with some questions addressing specific patient populations. The performance of ChatGPT 3.5 Legacy, ChatGPT 3.5 Default and ChatGPT-4 models was evaluated and compared.Results ChatGPT 3.5 Legacy and ChatGPT 3.5 Default displayed overall accuracies of 42% and 57%, respectively, falling short of the passing threshold of 58% for the 2022 SCE neurology examination. ChatGPT-4, on the other hand, achieved the highest accuracy of 64%, surpassing the passing threshold and outperforming its predecessors across disciplines and subtopics.Conclusions The advancements in ChatGPT-4’s performance compared with its predecessors demonstrate the potential for artificial intelligence (AI) models in specialised medical education and practice. However, our findings also highlight the need for ongoing development and collaboration between AI developers and medical experts to ensure the models’ relevance and reliability in the rapidly evolving field of medicine.
first_indexed 2024-03-12T14:22:21Z
format Article
id doaj.art-62039a7c83284ea5a2d4e30d5262e39e
institution Directory Open Access Journal
issn 2632-6140
language English
last_indexed 2024-03-12T14:22:21Z
publishDate 2023-06-01
publisher BMJ Publishing Group
record_format Article
series BMJ Neurology Open
spelling doaj.art-62039a7c83284ea5a2d4e30d5262e39e2023-08-18T15:35:08ZengBMJ Publishing GroupBMJ Neurology Open2632-61402023-06-015110.1136/bmjno-2023-000451Evaluating the limits of AI in medical specialisation: ChatGPT’s performance on the UK Neurology Specialty Certificate ExaminationPanagiotis Giannos0Department of Life Sciences, Imperial College London, London, UKBackground Large language models such as ChatGPT have demonstrated potential as innovative tools for medical education and practice, with studies showing their ability to perform at or near the passing threshold in general medical examinations and standardised admission tests. However, no studies have assessed their performance in the UK medical education context, particularly at a specialty level, and specifically in the field of neurology and neuroscience.Methods We evaluated the performance of ChatGPT in higher specialty training for neurology and neuroscience using 69 questions from the Pool—Specialty Certificate Examination (SCE) Neurology Web Questions bank. The dataset primarily focused on neurology (80%). The questions spanned subtopics such as symptoms and signs, diagnosis, interpretation and management with some questions addressing specific patient populations. The performance of ChatGPT 3.5 Legacy, ChatGPT 3.5 Default and ChatGPT-4 models was evaluated and compared.Results ChatGPT 3.5 Legacy and ChatGPT 3.5 Default displayed overall accuracies of 42% and 57%, respectively, falling short of the passing threshold of 58% for the 2022 SCE neurology examination. ChatGPT-4, on the other hand, achieved the highest accuracy of 64%, surpassing the passing threshold and outperforming its predecessors across disciplines and subtopics.Conclusions The advancements in ChatGPT-4’s performance compared with its predecessors demonstrate the potential for artificial intelligence (AI) models in specialised medical education and practice. However, our findings also highlight the need for ongoing development and collaboration between AI developers and medical experts to ensure the models’ relevance and reliability in the rapidly evolving field of medicine.https://neurologyopen.bmj.com/content/5/1/e000451.full
spellingShingle Panagiotis Giannos
Evaluating the limits of AI in medical specialisation: ChatGPT’s performance on the UK Neurology Specialty Certificate Examination
BMJ Neurology Open
title Evaluating the limits of AI in medical specialisation: ChatGPT’s performance on the UK Neurology Specialty Certificate Examination
title_full Evaluating the limits of AI in medical specialisation: ChatGPT’s performance on the UK Neurology Specialty Certificate Examination
title_fullStr Evaluating the limits of AI in medical specialisation: ChatGPT’s performance on the UK Neurology Specialty Certificate Examination
title_full_unstemmed Evaluating the limits of AI in medical specialisation: ChatGPT’s performance on the UK Neurology Specialty Certificate Examination
title_short Evaluating the limits of AI in medical specialisation: ChatGPT’s performance on the UK Neurology Specialty Certificate Examination
title_sort evaluating the limits of ai in medical specialisation chatgpt s performance on the uk neurology specialty certificate examination
url https://neurologyopen.bmj.com/content/5/1/e000451.full
work_keys_str_mv AT panagiotisgiannos evaluatingthelimitsofaiinmedicalspecialisationchatgptsperformanceontheukneurologyspecialtycertificateexamination