ChatGPT’s Response Consistency: A Study on Repeated Queries of Medical Examination Questions

(1) Background: As the field of artificial intelligence (AI) evolves, tools like ChatGPT are increasingly integrated into various domains of medicine, including medical education and research. Given the critical nature of medicine, it is of paramount importance that AI tools offer a high degree of r...

Full description

Bibliographic Details
Main Authors:	Paul F. Funk, Cosima C. Hoch, Samuel Knoedler, Leonard Knoedler, Sebastian Cotofana, Giuseppe Sofo, Ali Bashiri Dezfouli, Barbara Wollenberg, Orlando Guntinas-Lichius, Michael Alfertshofer
Format:	Article
Language:	English
Published:	MDPI AG 2024-03-01
Series:	European Journal of Investigation in Health, Psychology and Education
Subjects:	ChatGPT artificial intelligence medical state examination questions indecisiveness response consistency
Online Access:	https://www.mdpi.com/2254-9625/14/3/43

_version_	1827306456151490560
author	Paul F. Funk Cosima C. Hoch Samuel Knoedler Leonard Knoedler Sebastian Cotofana Giuseppe Sofo Ali Bashiri Dezfouli Barbara Wollenberg Orlando Guntinas-Lichius Michael Alfertshofer
author_facet	Paul F. Funk Cosima C. Hoch Samuel Knoedler Leonard Knoedler Sebastian Cotofana Giuseppe Sofo Ali Bashiri Dezfouli Barbara Wollenberg Orlando Guntinas-Lichius Michael Alfertshofer
author_sort	Paul F. Funk
collection	DOAJ
description	(1) Background: As the field of artificial intelligence (AI) evolves, tools like ChatGPT are increasingly integrated into various domains of medicine, including medical education and research. Given the critical nature of medicine, it is of paramount importance that AI tools offer a high degree of reliability in the information they provide. (2) Methods: A total of <i>n</i> = 450 medical examination questions were manually entered into ChatGPT thrice, each for ChatGPT 3.5 and ChatGPT 4. The responses were collected, and their accuracy and consistency were statistically analyzed throughout the series of entries. (3) Results: ChatGPT 4 displayed a statistically significantly improved accuracy with 85.7% compared to that of 57.7% of ChatGPT 3.5 (<i>p</i> < 0.001). Furthermore, ChatGPT 4 was more consistent, correctly answering 77.8% across all rounds, a significant increase from the 44.9% observed from ChatGPT 3.5 (<i>p</i> < 0.001). (4) Conclusions: The findings underscore the increased accuracy and dependability of ChatGPT 4 in the context of medical education and potential clinical decision making. Nonetheless, the research emphasizes the indispensable nature of human-delivered healthcare and the vital role of continuous assessment in leveraging AI in medicine.
first_indexed	2024-04-24T18:22:48Z
format	Article
id	doaj.art-ced7920151f6453785b99ecb508c3343
institution	Directory Open Access Journal
issn	2174-8144 2254-9625
language	English
last_indexed	2024-04-24T18:22:48Z
publishDate	2024-03-01
publisher	MDPI AG
record_format	Article
series	European Journal of Investigation in Health, Psychology and Education
spelling	doaj.art-ced7920151f6453785b99ecb508c33432024-03-27T13:34:37ZengMDPI AGEuropean Journal of Investigation in Health, Psychology and Education2174-81442254-96252024-03-0114365766810.3390/ejihpe14030043ChatGPT’s Response Consistency: A Study on Repeated Queries of Medical Examination QuestionsPaul F. Funk0Cosima C. Hoch1Samuel Knoedler2Leonard Knoedler3Sebastian Cotofana4Giuseppe Sofo5Ali Bashiri Dezfouli6Barbara Wollenberg7Orlando Guntinas-Lichius8Michael Alfertshofer9Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Jena, Friedrich Schiller University Jena, Am Klinikum 1, 07747 Jena, GermanyDepartment of Otolaryngology, Head and Neck Surgery, School of Medicine and Health, Technical University of Munich (TUM), Ismaningerstrasse 22, 81675 Munich, GermanyDepartment of Plastic Surgery and Hand Surgery, Klinikum Rechts der Isar, Technical University of Munich (TUM), Ismaningerstrasse 22, 81675 Munich, GermanyDivision of Plastic and Reconstructive Surgery, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA 02114, USADepartment of Dermatology, Erasmus Medical Centre, Dr. Molewaterplein 40, 3015 GD Rotterdam, The NetherlandsInstituto Ivo Pitanguy, Hospital Santa Casa de Misericórdia Rio de Janeiro, Pontifícia Universidade Católica do Rio de Janeiro, Rio de Janeiro 20020-022, BrazilDepartment of Otolaryngology, Head and Neck Surgery, School of Medicine and Health, Technical University of Munich (TUM), Ismaningerstrasse 22, 81675 Munich, GermanyDepartment of Otolaryngology, Head and Neck Surgery, School of Medicine and Health, Technical University of Munich (TUM), Ismaningerstrasse 22, 81675 Munich, GermanyDepartment of Otorhinolaryngology, Head and Neck Surgery, University Hospital Jena, Friedrich Schiller University Jena, Am Klinikum 1, 07747 Jena, GermanyDepartment of Plastic Surgery and Hand Surgery, Klinikum Rechts der Isar, Technical University of Munich (TUM), Ismaningerstrasse 22, 81675 Munich, Germany(1) Background: As the field of artificial intelligence (AI) evolves, tools like ChatGPT are increasingly integrated into various domains of medicine, including medical education and research. Given the critical nature of medicine, it is of paramount importance that AI tools offer a high degree of reliability in the information they provide. (2) Methods: A total of <i>n</i> = 450 medical examination questions were manually entered into ChatGPT thrice, each for ChatGPT 3.5 and ChatGPT 4. The responses were collected, and their accuracy and consistency were statistically analyzed throughout the series of entries. (3) Results: ChatGPT 4 displayed a statistically significantly improved accuracy with 85.7% compared to that of 57.7% of ChatGPT 3.5 (<i>p</i> < 0.001). Furthermore, ChatGPT 4 was more consistent, correctly answering 77.8% across all rounds, a significant increase from the 44.9% observed from ChatGPT 3.5 (<i>p</i> < 0.001). (4) Conclusions: The findings underscore the increased accuracy and dependability of ChatGPT 4 in the context of medical education and potential clinical decision making. Nonetheless, the research emphasizes the indispensable nature of human-delivered healthcare and the vital role of continuous assessment in leveraging AI in medicine.https://www.mdpi.com/2254-9625/14/3/43ChatGPTartificial intelligencemedical state examination questionsindecisivenessresponse consistency
spellingShingle	Paul F. Funk Cosima C. Hoch Samuel Knoedler Leonard Knoedler Sebastian Cotofana Giuseppe Sofo Ali Bashiri Dezfouli Barbara Wollenberg Orlando Guntinas-Lichius Michael Alfertshofer ChatGPT’s Response Consistency: A Study on Repeated Queries of Medical Examination Questions European Journal of Investigation in Health, Psychology and Education ChatGPT artificial intelligence medical state examination questions indecisiveness response consistency
title	ChatGPT’s Response Consistency: A Study on Repeated Queries of Medical Examination Questions
title_full	ChatGPT’s Response Consistency: A Study on Repeated Queries of Medical Examination Questions
title_fullStr	ChatGPT’s Response Consistency: A Study on Repeated Queries of Medical Examination Questions
title_full_unstemmed	ChatGPT’s Response Consistency: A Study on Repeated Queries of Medical Examination Questions
title_short	ChatGPT’s Response Consistency: A Study on Repeated Queries of Medical Examination Questions
title_sort	chatgpt s response consistency a study on repeated queries of medical examination questions
topic	ChatGPT artificial intelligence medical state examination questions indecisiveness response consistency
url	https://www.mdpi.com/2254-9625/14/3/43
work_keys_str_mv	AT paulffunk chatgptsresponseconsistencyastudyonrepeatedqueriesofmedicalexaminationquestions AT cosimachoch chatgptsresponseconsistencyastudyonrepeatedqueriesofmedicalexaminationquestions AT samuelknoedler chatgptsresponseconsistencyastudyonrepeatedqueriesofmedicalexaminationquestions AT leonardknoedler chatgptsresponseconsistencyastudyonrepeatedqueriesofmedicalexaminationquestions AT sebastiancotofana chatgptsresponseconsistencyastudyonrepeatedqueriesofmedicalexaminationquestions AT giuseppesofo chatgptsresponseconsistencyastudyonrepeatedqueriesofmedicalexaminationquestions AT alibashiridezfouli chatgptsresponseconsistencyastudyonrepeatedqueriesofmedicalexaminationquestions AT barbarawollenberg chatgptsresponseconsistencyastudyonrepeatedqueriesofmedicalexaminationquestions AT orlandoguntinaslichius chatgptsresponseconsistencyastudyonrepeatedqueriesofmedicalexaminationquestions AT michaelalfertshofer chatgptsresponseconsistencyastudyonrepeatedqueriesofmedicalexaminationquestions

ChatGPT’s Response Consistency: A Study on Repeated Queries of Medical Examination Questions

Similar Items