Performance of ChatGPT on UK Standardized Admission Tests: Insights From the BMAT, TMUA, LNAT, and TSA Examinations

BackgroundLarge language models, such as ChatGPT by OpenAI, have demonstrated potential in various applications, including medical education. Previous studies have assessed ChatGPT’s performance in university or professional settings. However, the model’s potential in the con...

Full description

Bibliographic Details
Main Authors:	Panagiotis Giannos, Orestis Delardas
Format:	Article
Language:	English
Published:	JMIR Publications 2023-04-01
Series:	JMIR Medical Education
Online Access:	https://mededu.jmir.org/2023/1/e47737

_version_	1797734090885038080
author	Panagiotis Giannos Orestis Delardas
author_facet	Panagiotis Giannos Orestis Delardas
author_sort	Panagiotis Giannos
collection	DOAJ
description	BackgroundLarge language models, such as ChatGPT by OpenAI, have demonstrated potential in various applications, including medical education. Previous studies have assessed ChatGPT’s performance in university or professional settings. However, the model’s potential in the context of standardized admission tests remains unexplored. ObjectiveThis study evaluated ChatGPT’s performance on standardized admission tests in the United Kingdom, including the BioMedical Admissions Test (BMAT), Test of Mathematics for University Admission (TMUA), Law National Aptitude Test (LNAT), and Thinking Skills Assessment (TSA), to understand its potential as an innovative tool for education and test preparation. MethodsRecent public resources (2019-2022) were used to compile a data set of 509 questions from the BMAT, TMUA, LNAT, and TSA covering diverse topics in aptitude, scientific knowledge and applications, mathematical thinking and reasoning, critical thinking, problem-solving, reading comprehension, and logical reasoning. This evaluation assessed ChatGPT’s performance using the legacy GPT-3.5 model, focusing on multiple-choice questions for consistency. The model’s performance was analyzed based on question difficulty, the proportion of correct responses when aggregating exams from all years, and a comparison of test scores between papers of the same exam using binomial distribution and paired-sample (2-tailed) t tests. ResultsThe proportion of correct responses was significantly lower than incorrect ones in BMAT section 2 (P<.001) and TMUA paper 1 (P<.001) and paper 2 (P<.001). No significant differences were observed in BMAT section 1 (P=.2), TSA section 1 (P=.7), or LNAT papers 1 and 2, section A (P=.3). ChatGPT performed better in BMAT section 1 than section 2 (P=.047), with a maximum candidate ranking of 73% compared to a minimum of 1%. In the TMUA, it engaged with questions but had limited accuracy and no performance difference between papers (P=.6), with candidate rankings below 10%. In the LNAT, it demonstrated moderate success, especially in paper 2’s questions; however, student performance data were unavailable. TSA performance varied across years with generally moderate results and fluctuating candidate rankings. Similar trends were observed for easy to moderate difficulty questions (BMAT section 1, P=.3; BMAT section 2, P=.04; TMUA paper 1, P<.001; TMUA paper 2, P=.003; TSA section 1, P=.8; and LNAT papers 1 and 2, section A, P>.99) and hard to challenging ones (BMAT section 1, P=.7; BMAT section 2, P<.001; TMUA paper 1, P=.007; TMUA paper 2, P<.001; TSA section 1, P=.3; and LNAT papers 1 and 2, section A, P=.2). ConclusionsChatGPT shows promise as a supplementary tool for subject areas and test formats that assess aptitude, problem-solving and critical thinking, and reading comprehension. However, its limitations in areas such as scientific and mathematical knowledge and applications highlight the need for continuous development and integration with conventional learning strategies in order to fully harness its potential.
first_indexed	2024-03-12T12:39:15Z
format	Article
id	doaj.art-4ccc00c10f70418085ef07eb150c8e6f
institution	Directory Open Access Journal
issn	2369-3762
language	English
last_indexed	2024-03-12T12:39:15Z
publishDate	2023-04-01
publisher	JMIR Publications
record_format	Article
series	JMIR Medical Education
spelling	doaj.art-4ccc00c10f70418085ef07eb150c8e6f2023-08-28T23:57:47ZengJMIR PublicationsJMIR Medical Education2369-37622023-04-019e4773710.2196/47737Performance of ChatGPT on UK Standardized Admission Tests: Insights From the BMAT, TMUA, LNAT, and TSA ExaminationsPanagiotis Giannoshttps://orcid.org/0000-0003-1037-1983Orestis Delardashttps://orcid.org/0000-0003-2835-0400 BackgroundLarge language models, such as ChatGPT by OpenAI, have demonstrated potential in various applications, including medical education. Previous studies have assessed ChatGPT’s performance in university or professional settings. However, the model’s potential in the context of standardized admission tests remains unexplored. ObjectiveThis study evaluated ChatGPT’s performance on standardized admission tests in the United Kingdom, including the BioMedical Admissions Test (BMAT), Test of Mathematics for University Admission (TMUA), Law National Aptitude Test (LNAT), and Thinking Skills Assessment (TSA), to understand its potential as an innovative tool for education and test preparation. MethodsRecent public resources (2019-2022) were used to compile a data set of 509 questions from the BMAT, TMUA, LNAT, and TSA covering diverse topics in aptitude, scientific knowledge and applications, mathematical thinking and reasoning, critical thinking, problem-solving, reading comprehension, and logical reasoning. This evaluation assessed ChatGPT’s performance using the legacy GPT-3.5 model, focusing on multiple-choice questions for consistency. The model’s performance was analyzed based on question difficulty, the proportion of correct responses when aggregating exams from all years, and a comparison of test scores between papers of the same exam using binomial distribution and paired-sample (2-tailed) t tests. ResultsThe proportion of correct responses was significantly lower than incorrect ones in BMAT section 2 (P<.001) and TMUA paper 1 (P<.001) and paper 2 (P<.001). No significant differences were observed in BMAT section 1 (P=.2), TSA section 1 (P=.7), or LNAT papers 1 and 2, section A (P=.3). ChatGPT performed better in BMAT section 1 than section 2 (P=.047), with a maximum candidate ranking of 73% compared to a minimum of 1%. In the TMUA, it engaged with questions but had limited accuracy and no performance difference between papers (P=.6), with candidate rankings below 10%. In the LNAT, it demonstrated moderate success, especially in paper 2’s questions; however, student performance data were unavailable. TSA performance varied across years with generally moderate results and fluctuating candidate rankings. Similar trends were observed for easy to moderate difficulty questions (BMAT section 1, P=.3; BMAT section 2, P=.04; TMUA paper 1, P<.001; TMUA paper 2, P=.003; TSA section 1, P=.8; and LNAT papers 1 and 2, section A, P>.99) and hard to challenging ones (BMAT section 1, P=.7; BMAT section 2, P<.001; TMUA paper 1, P=.007; TMUA paper 2, P<.001; TSA section 1, P=.3; and LNAT papers 1 and 2, section A, P=.2). ConclusionsChatGPT shows promise as a supplementary tool for subject areas and test formats that assess aptitude, problem-solving and critical thinking, and reading comprehension. However, its limitations in areas such as scientific and mathematical knowledge and applications highlight the need for continuous development and integration with conventional learning strategies in order to fully harness its potential.https://mededu.jmir.org/2023/1/e47737
spellingShingle	Panagiotis Giannos Orestis Delardas Performance of ChatGPT on UK Standardized Admission Tests: Insights From the BMAT, TMUA, LNAT, and TSA Examinations JMIR Medical Education
title	Performance of ChatGPT on UK Standardized Admission Tests: Insights From the BMAT, TMUA, LNAT, and TSA Examinations
title_full	Performance of ChatGPT on UK Standardized Admission Tests: Insights From the BMAT, TMUA, LNAT, and TSA Examinations
title_fullStr	Performance of ChatGPT on UK Standardized Admission Tests: Insights From the BMAT, TMUA, LNAT, and TSA Examinations
title_full_unstemmed	Performance of ChatGPT on UK Standardized Admission Tests: Insights From the BMAT, TMUA, LNAT, and TSA Examinations
title_short	Performance of ChatGPT on UK Standardized Admission Tests: Insights From the BMAT, TMUA, LNAT, and TSA Examinations
title_sort	performance of chatgpt on uk standardized admission tests insights from the bmat tmua lnat and tsa examinations
url	https://mededu.jmir.org/2023/1/e47737
work_keys_str_mv	AT panagiotisgiannos performanceofchatgptonukstandardizedadmissiontestsinsightsfromthebmattmualnatandtsaexaminations AT orestisdelardas performanceofchatgptonukstandardizedadmissiontestsinsightsfromthebmattmualnatandtsaexaminations

Performance of ChatGPT on UK Standardized Admission Tests: Insights From the BMAT, TMUA, LNAT, and TSA Examinations

Similar Items