Reliability of ChatGPT for performing triage task in the emergency department using the Korean Triage and Acuity Scale

Background Artificial intelligence (AI) technology can enable more efficient decision-making in healthcare settings. There is a growing interest in improving the speed and accuracy of AI systems in providing responses for given tasks in healthcare settings. Objective This study aimed to assess the r...

Full description

Bibliographic Details
Main Authors:	Jae Hyuk Kim, Sun Kyung Kim, Jongmyung Choi, Youngho Lee
Format:	Article
Language:	English
Published:	SAGE Publishing 2024-01-01
Series:	Digital Health
Online Access:	https://doi.org/10.1177/20552076241227132

_version_	1827378322992005120
author	Jae Hyuk Kim Sun Kyung Kim Jongmyung Choi Youngho Lee
author_facet	Jae Hyuk Kim Sun Kyung Kim Jongmyung Choi Youngho Lee
author_sort	Jae Hyuk Kim
collection	DOAJ
description	Background Artificial intelligence (AI) technology can enable more efficient decision-making in healthcare settings. There is a growing interest in improving the speed and accuracy of AI systems in providing responses for given tasks in healthcare settings. Objective This study aimed to assess the reliability of ChatGPT in determining emergency department (ED) triage accuracy using the Korean Triage and Acuity Scale (KTAS). Methods Two hundred and two virtual patient cases were built. The gold standard triage classification for each case was established by an experienced ED physician. Three other human raters (ED paramedics) were involved and rated the virtual cases individually. The virtual cases were also rated by two different versions of the chat generative pre-trained transformer (ChatGPT, 3.5 and 4.0). Inter-rater reliability was examined using Fleiss’ kappa and intra-class correlation coefficient (ICC). Results The kappa values for the agreement between the four human raters and ChatGPTs were .523 (version 4.0) and .320 (version 3.5). Of the five levels, the performance was poor when rating patients at levels 1 and 5, as well as case scenarios with additional text descriptions. There were differences in the accuracy of the different versions of GPTs. The ICC between version 3.5 and the gold standard was .520, and that between version 4.0 and the gold standard was .802. Conclusions A substantial level of inter-rater reliability was revealed when GPTs were used as KTAS raters. The current study showed the potential of using GPT in emergency healthcare settings. Considering the shortage of experienced manpower, this AI method may help improve triaging accuracy.
first_indexed	2024-03-08T12:53:43Z
format	Article
id	doaj.art-cdf977b282664244b1393622b46d32d1
institution	Directory Open Access Journal
issn	2055-2076
language	English
last_indexed	2024-03-08T12:53:43Z
publishDate	2024-01-01
publisher	SAGE Publishing
record_format	Article
series	Digital Health
spelling	doaj.art-cdf977b282664244b1393622b46d32d12024-01-19T20:06:08ZengSAGE PublishingDigital Health2055-20762024-01-011010.1177/20552076241227132Reliability of ChatGPT for performing triage task in the emergency department using the Korean Triage and Acuity ScaleJae Hyuk Kim0Sun Kyung Kim1Jongmyung Choi2Youngho Lee3 Department of Emergency Medicine, Mokpo Hankook Hospital, Jeonnam, South Korea Department of Biomedicine, Health & Life Convergence Sciences, Biomedical and Healthcare Research Institute, Jeonnam, South Korea Department of Computer Engineering, , Jeonnam, South Korea Department of Computer Engineering, , Jeonnam, South KoreaBackground Artificial intelligence (AI) technology can enable more efficient decision-making in healthcare settings. There is a growing interest in improving the speed and accuracy of AI systems in providing responses for given tasks in healthcare settings. Objective This study aimed to assess the reliability of ChatGPT in determining emergency department (ED) triage accuracy using the Korean Triage and Acuity Scale (KTAS). Methods Two hundred and two virtual patient cases were built. The gold standard triage classification for each case was established by an experienced ED physician. Three other human raters (ED paramedics) were involved and rated the virtual cases individually. The virtual cases were also rated by two different versions of the chat generative pre-trained transformer (ChatGPT, 3.5 and 4.0). Inter-rater reliability was examined using Fleiss’ kappa and intra-class correlation coefficient (ICC). Results The kappa values for the agreement between the four human raters and ChatGPTs were .523 (version 4.0) and .320 (version 3.5). Of the five levels, the performance was poor when rating patients at levels 1 and 5, as well as case scenarios with additional text descriptions. There were differences in the accuracy of the different versions of GPTs. The ICC between version 3.5 and the gold standard was .520, and that between version 4.0 and the gold standard was .802. Conclusions A substantial level of inter-rater reliability was revealed when GPTs were used as KTAS raters. The current study showed the potential of using GPT in emergency healthcare settings. Considering the shortage of experienced manpower, this AI method may help improve triaging accuracy.https://doi.org/10.1177/20552076241227132
spellingShingle	Jae Hyuk Kim Sun Kyung Kim Jongmyung Choi Youngho Lee Reliability of ChatGPT for performing triage task in the emergency department using the Korean Triage and Acuity Scale Digital Health
title	Reliability of ChatGPT for performing triage task in the emergency department using the Korean Triage and Acuity Scale
title_full	Reliability of ChatGPT for performing triage task in the emergency department using the Korean Triage and Acuity Scale
title_fullStr	Reliability of ChatGPT for performing triage task in the emergency department using the Korean Triage and Acuity Scale
title_full_unstemmed	Reliability of ChatGPT for performing triage task in the emergency department using the Korean Triage and Acuity Scale
title_short	Reliability of ChatGPT for performing triage task in the emergency department using the Korean Triage and Acuity Scale
title_sort	reliability of chatgpt for performing triage task in the emergency department using the korean triage and acuity scale
url	https://doi.org/10.1177/20552076241227132
work_keys_str_mv	AT jaehyukkim reliabilityofchatgptforperformingtriagetaskintheemergencydepartmentusingthekoreantriageandacuityscale AT sunkyungkim reliabilityofchatgptforperformingtriagetaskintheemergencydepartmentusingthekoreantriageandacuityscale AT jongmyungchoi reliabilityofchatgptforperformingtriagetaskintheemergencydepartmentusingthekoreantriageandacuityscale AT youngholee reliabilityofchatgptforperformingtriagetaskintheemergencydepartmentusingthekoreantriageandacuityscale

Reliability of ChatGPT for performing triage task in the emergency department using the Korean Triage and Acuity Scale

Similar Items