ChatGPT Performs at the Level of a Third-Year Orthopaedic Surgery Resident on the Orthopaedic In-Training Examination

Introduction:. Publicly available AI language models such as ChatGPT have demonstrated utility in text generation and even problem-solving when provided with clear instructions. Amidst this transformative shift, the aim of this study is to assess ChatGPT's performance on the orthopaedic surgery...

Full description

Bibliographic Details
Main Authors:	Diane Ghanem, MD, Oscar Covarrubias, BS, Micheal Raad, MD, Dawn LaPorte, MD, FAOA, Babar Shafiq, MD, FAOA
Format:	Article
Language:	English
Published:	Wolters Kluwer 2023-12-01
Series:	JBJS Open Access
Online Access:	http://journals.lww.com/jbjsoa/fulltext/10.2106/JBJS.OA.23.00103

_version_	1797375001294274560
author	Diane Ghanem, MD Oscar Covarrubias, BS Micheal Raad, MD Dawn LaPorte, MD, FAOA Babar Shafiq, MD, FAOA
author_facet	Diane Ghanem, MD Oscar Covarrubias, BS Micheal Raad, MD Dawn LaPorte, MD, FAOA Babar Shafiq, MD, FAOA
author_sort	Diane Ghanem, MD
collection	DOAJ
description	Introduction:. Publicly available AI language models such as ChatGPT have demonstrated utility in text generation and even problem-solving when provided with clear instructions. Amidst this transformative shift, the aim of this study is to assess ChatGPT's performance on the orthopaedic surgery in-training examination (OITE). Methods:. All 213 OITE 2021 web-based questions were retrieved from the AAOS-ResStudy website (https://www.aaos.org/education/examinations/ResStudy). Two independent reviewers copied and pasted the questions and response options into ChatGPT Plus (version 4.0) and recorded the generated answers. All media-containing questions were flagged and carefully examined. Twelve OITE media-containing questions that relied purely on images (clinical pictures, radiographs, MRIs, CT scans) and could not be rationalized from the clinical presentation were excluded. Cohen's Kappa coefficient was used to examine the agreement of ChatGPT-generated responses between reviewers. Descriptive statistics were used to summarize the performance (% correct) of ChatGPT Plus. The 2021 norm table was used to compare ChatGPT Plus' performance on the OITE to national orthopaedic surgery residents in that same year. Results:. A total of 201 questions were evaluated by ChatGPT Plus. Excellent agreement was observed between raters for the 201 ChatGPT-generated responses, with a Cohen's Kappa coefficient of 0.947. 45.8% (92/201) were media-containing questions. ChatGPT had an average overall score of 61.2% (123/201). Its score was 64.2% (70/109) on non-media questions. When compared to the performance of all national orthopaedic surgery residents in 2021, ChatGPT Plus performed at the level of an average PGY3. Discussion:. ChatGPT Plus is able to pass the OITE with an overall score of 61.2%, ranking at the level of a third-year orthopaedic surgery resident. It provided logical reasoning and justifications that may help residents improve their understanding of OITE cases and general orthopaedic principles. Further studies are still needed to examine their efficacy and impact on long-term learning and OITE/ABOS performance.
first_indexed	2024-03-08T19:15:53Z
format	Article
id	doaj.art-c760d34ea8b24e1ab432f86ddf545601
institution	Directory Open Access Journal
issn	2472-7245
language	English
last_indexed	2024-03-08T19:15:53Z
publishDate	2023-12-01
publisher	Wolters Kluwer
record_format	Article
series	JBJS Open Access
spelling	doaj.art-c760d34ea8b24e1ab432f86ddf5456012023-12-27T06:51:48ZengWolters KluwerJBJS Open Access2472-72452023-12-018410.2106/JBJS.OA.23.00103JBJSOA2300103ChatGPT Performs at the Level of a Third-Year Orthopaedic Surgery Resident on the Orthopaedic In-Training ExaminationDiane Ghanem, MD0Oscar Covarrubias, BS1Micheal Raad, MD2Dawn LaPorte, MD, FAOA3Babar Shafiq, MD, FAOA41 Department of Orthopaedic Surgery, The Johns Hopkins Hospital, Baltimore, Maryland2 School of Medicine, The Johns Hopkins University, Baltimore, Maryland1 Department of Orthopaedic Surgery, The Johns Hopkins Hospital, Baltimore, Maryland1 Department of Orthopaedic Surgery, The Johns Hopkins Hospital, Baltimore, Maryland1 Department of Orthopaedic Surgery, The Johns Hopkins Hospital, Baltimore, MarylandIntroduction:. Publicly available AI language models such as ChatGPT have demonstrated utility in text generation and even problem-solving when provided with clear instructions. Amidst this transformative shift, the aim of this study is to assess ChatGPT's performance on the orthopaedic surgery in-training examination (OITE). Methods:. All 213 OITE 2021 web-based questions were retrieved from the AAOS-ResStudy website (https://www.aaos.org/education/examinations/ResStudy). Two independent reviewers copied and pasted the questions and response options into ChatGPT Plus (version 4.0) and recorded the generated answers. All media-containing questions were flagged and carefully examined. Twelve OITE media-containing questions that relied purely on images (clinical pictures, radiographs, MRIs, CT scans) and could not be rationalized from the clinical presentation were excluded. Cohen's Kappa coefficient was used to examine the agreement of ChatGPT-generated responses between reviewers. Descriptive statistics were used to summarize the performance (% correct) of ChatGPT Plus. The 2021 norm table was used to compare ChatGPT Plus' performance on the OITE to national orthopaedic surgery residents in that same year. Results:. A total of 201 questions were evaluated by ChatGPT Plus. Excellent agreement was observed between raters for the 201 ChatGPT-generated responses, with a Cohen's Kappa coefficient of 0.947. 45.8% (92/201) were media-containing questions. ChatGPT had an average overall score of 61.2% (123/201). Its score was 64.2% (70/109) on non-media questions. When compared to the performance of all national orthopaedic surgery residents in 2021, ChatGPT Plus performed at the level of an average PGY3. Discussion:. ChatGPT Plus is able to pass the OITE with an overall score of 61.2%, ranking at the level of a third-year orthopaedic surgery resident. It provided logical reasoning and justifications that may help residents improve their understanding of OITE cases and general orthopaedic principles. Further studies are still needed to examine their efficacy and impact on long-term learning and OITE/ABOS performance.http://journals.lww.com/jbjsoa/fulltext/10.2106/JBJS.OA.23.00103
spellingShingle	Diane Ghanem, MD Oscar Covarrubias, BS Micheal Raad, MD Dawn LaPorte, MD, FAOA Babar Shafiq, MD, FAOA ChatGPT Performs at the Level of a Third-Year Orthopaedic Surgery Resident on the Orthopaedic In-Training Examination JBJS Open Access
title	ChatGPT Performs at the Level of a Third-Year Orthopaedic Surgery Resident on the Orthopaedic In-Training Examination
title_full	ChatGPT Performs at the Level of a Third-Year Orthopaedic Surgery Resident on the Orthopaedic In-Training Examination
title_fullStr	ChatGPT Performs at the Level of a Third-Year Orthopaedic Surgery Resident on the Orthopaedic In-Training Examination
title_full_unstemmed	ChatGPT Performs at the Level of a Third-Year Orthopaedic Surgery Resident on the Orthopaedic In-Training Examination
title_short	ChatGPT Performs at the Level of a Third-Year Orthopaedic Surgery Resident on the Orthopaedic In-Training Examination
title_sort	chatgpt performs at the level of a third year orthopaedic surgery resident on the orthopaedic in training examination
url	http://journals.lww.com/jbjsoa/fulltext/10.2106/JBJS.OA.23.00103
work_keys_str_mv	AT dianeghanemmd chatgptperformsatthelevelofathirdyearorthopaedicsurgeryresidentontheorthopaedicintrainingexamination AT oscarcovarrubiasbs chatgptperformsatthelevelofathirdyearorthopaedicsurgeryresidentontheorthopaedicintrainingexamination AT michealraadmd chatgptperformsatthelevelofathirdyearorthopaedicsurgeryresidentontheorthopaedicintrainingexamination AT dawnlaportemdfaoa chatgptperformsatthelevelofathirdyearorthopaedicsurgeryresidentontheorthopaedicintrainingexamination AT babarshafiqmdfaoa chatgptperformsatthelevelofathirdyearorthopaedicsurgeryresidentontheorthopaedicintrainingexamination

ChatGPT Performs at the Level of a Third-Year Orthopaedic Surgery Resident on the Orthopaedic In-Training Examination

Similar Items