Answering patterns in SBA items: students, GPT3.5, and Gemini

While large language models (LLMs) are often used to generate and answer exam questions, limited work compares their performance across multiple iterations using item statistics. This study aims to fill that gap by investigating answering patterns of how LLMs respond to single-best answer (SBA) ques...

Full description

Bibliographic Details
Main Authors:	Ng, Olivia, Phua, Dong Haur, Chu, Jowe, Wilding, Lucy V. E., Mogali, Sreenivasulu Reddy, Cleland, Jennifer
Other Authors:	Lee Kong Chian School of Medicine (LKCMedicine)
Format:	Journal Article
Language:	English
Published:	2025
Subjects:	Medicine, Health and Life Sciences Assessments ChatGPT
Online Access:	https://hdl.handle.net/10356/181959

_version_	1826129615017476096
author	Ng, Olivia Phua, Dong Haur Chu, Jowe Wilding, Lucy V. E. Mogali, Sreenivasulu Reddy Cleland, Jennifer
author2	Lee Kong Chian School of Medicine (LKCMedicine)
author_facet	Lee Kong Chian School of Medicine (LKCMedicine) Ng, Olivia Phua, Dong Haur Chu, Jowe Wilding, Lucy V. E. Mogali, Sreenivasulu Reddy Cleland, Jennifer
author_sort	Ng, Olivia
collection	NTU
description	While large language models (LLMs) are often used to generate and answer exam questions, limited work compares their performance across multiple iterations using item statistics. This study aims to fill that gap by investigating answering patterns of how LLMs respond to single-best answer (SBA) questions, comparing their performance to that of students. Forty-one SBA questions for first-year medical students were assessed using the most easily assessable and free-to-use GPT3.5 and Gemini across 100 iterations. Both LLMs exhibited more repetitive and clustered answering patterns compared to students, which can be problematic as it may compound mistakes by repeating error selection. Distractor analysis revealed that students performed better when managing multiple options in the SBA format. We found that these free-to-use LLMs are inferior to well-trained students or specialists in handling technical questions. We have also highlighted concerns on LLMs’ contextual interpretation of these items and the need of human oversight in the medical education assessment process.
first_indexed	2025-03-09T15:08:32Z
format	Journal Article
id	ntu-10356/181959
institution	Nanyang Technological University
language	English
last_indexed	2025-03-09T15:08:32Z
publishDate	2025
record_format	dspace
spelling	ntu-10356/1819592025-01-05T15:39:32Z Answering patterns in SBA items: students, GPT3.5, and Gemini Ng, Olivia Phua, Dong Haur Chu, Jowe Wilding, Lucy V. E. Mogali, Sreenivasulu Reddy Cleland, Jennifer Lee Kong Chian School of Medicine (LKCMedicine) Medicine, Health and Life Sciences Assessments ChatGPT While large language models (LLMs) are often used to generate and answer exam questions, limited work compares their performance across multiple iterations using item statistics. This study aims to fill that gap by investigating answering patterns of how LLMs respond to single-best answer (SBA) questions, comparing their performance to that of students. Forty-one SBA questions for first-year medical students were assessed using the most easily assessable and free-to-use GPT3.5 and Gemini across 100 iterations. Both LLMs exhibited more repetitive and clustered answering patterns compared to students, which can be problematic as it may compound mistakes by repeating error selection. Distractor analysis revealed that students performed better when managing multiple options in the SBA format. We found that these free-to-use LLMs are inferior to well-trained students or specialists in handling technical questions. We have also highlighted concerns on LLMs’ contextual interpretation of these items and the need of human oversight in the medical education assessment process. Submitted/Accepted version 2025-01-04T07:52:53Z 2025-01-04T07:52:53Z 2024 Journal Article Ng, O., Phua, D. H., Chu, J., Wilding, L. V. E., Mogali, S. R. & Cleland, J. (2024). Answering patterns in SBA items: students, GPT3.5, and Gemini. Medical Science Educator. https://dx.doi.org/10.1007/s40670-024-02232-4 2156-8650 https://hdl.handle.net/10356/181959 10.1007/s40670-024-02232-4 2-s2.0-85210403672 en Medical Science Educator © 2024 The Author(s), under exclusive licence to International Association of Medical Science Educators. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1007/s40670-024-02232-4. application/pdf
spellingShingle	Medicine, Health and Life Sciences Assessments ChatGPT Ng, Olivia Phua, Dong Haur Chu, Jowe Wilding, Lucy V. E. Mogali, Sreenivasulu Reddy Cleland, Jennifer Answering patterns in SBA items: students, GPT3.5, and Gemini
title	Answering patterns in SBA items: students, GPT3.5, and Gemini
title_full	Answering patterns in SBA items: students, GPT3.5, and Gemini
title_fullStr	Answering patterns in SBA items: students, GPT3.5, and Gemini
title_full_unstemmed	Answering patterns in SBA items: students, GPT3.5, and Gemini
title_short	Answering patterns in SBA items: students, GPT3.5, and Gemini
title_sort	answering patterns in sba items students gpt3 5 and gemini
topic	Medicine, Health and Life Sciences Assessments ChatGPT
url	https://hdl.handle.net/10356/181959
work_keys_str_mv	AT ngolivia answeringpatternsinsbaitemsstudentsgpt35andgemini AT phuadonghaur answeringpatternsinsbaitemsstudentsgpt35andgemini AT chujowe answeringpatternsinsbaitemsstudentsgpt35andgemini AT wildinglucyve answeringpatternsinsbaitemsstudentsgpt35andgemini AT mogalisreenivasulureddy answeringpatternsinsbaitemsstudentsgpt35andgemini AT clelandjennifer answeringpatternsinsbaitemsstudentsgpt35andgemini

Answering patterns in SBA items: students, GPT3.5, and Gemini

Similar Items