Performance of ChatGPT on UK Standardized Admission Tests: Insights From the BMAT, TMUA, LNAT, and TSA Examinations

BackgroundLarge language models, such as ChatGPT by OpenAI, have demonstrated potential in various applications, including medical education. Previous studies have assessed ChatGPT’s performance in university or professional settings. However, the model’s potential in the con...

Full description

Bibliographic Details
Main Authors: Panagiotis Giannos, Orestis Delardas
Format: Article
Language:English
Published: JMIR Publications 2023-04-01
Series:JMIR Medical Education
Online Access:https://mededu.jmir.org/2023/1/e47737
_version_ 1797734090885038080
author Panagiotis Giannos
Orestis Delardas
author_facet Panagiotis Giannos
Orestis Delardas
author_sort Panagiotis Giannos
collection DOAJ
description BackgroundLarge language models, such as ChatGPT by OpenAI, have demonstrated potential in various applications, including medical education. Previous studies have assessed ChatGPT’s performance in university or professional settings. However, the model’s potential in the context of standardized admission tests remains unexplored. ObjectiveThis study evaluated ChatGPT’s performance on standardized admission tests in the United Kingdom, including the BioMedical Admissions Test (BMAT), Test of Mathematics for University Admission (TMUA), Law National Aptitude Test (LNAT), and Thinking Skills Assessment (TSA), to understand its potential as an innovative tool for education and test preparation. MethodsRecent public resources (2019-2022) were used to compile a data set of 509 questions from the BMAT, TMUA, LNAT, and TSA covering diverse topics in aptitude, scientific knowledge and applications, mathematical thinking and reasoning, critical thinking, problem-solving, reading comprehension, and logical reasoning. This evaluation assessed ChatGPT’s performance using the legacy GPT-3.5 model, focusing on multiple-choice questions for consistency. The model’s performance was analyzed based on question difficulty, the proportion of correct responses when aggregating exams from all years, and a comparison of test scores between papers of the same exam using binomial distribution and paired-sample (2-tailed) t tests. ResultsThe proportion of correct responses was significantly lower than incorrect ones in BMAT section 2 (P<.001) and TMUA paper 1 (P<.001) and paper 2 (P<.001). No significant differences were observed in BMAT section 1 (P=.2), TSA section 1 (P=.7), or LNAT papers 1 and 2, section A (P=.3). ChatGPT performed better in BMAT section 1 than section 2 (P=.047), with a maximum candidate ranking of 73% compared to a minimum of 1%. In the TMUA, it engaged with questions but had limited accuracy and no performance difference between papers (P=.6), with candidate rankings below 10%. In the LNAT, it demonstrated moderate success, especially in paper 2’s questions; however, student performance data were unavailable. TSA performance varied across years with generally moderate results and fluctuating candidate rankings. Similar trends were observed for easy to moderate difficulty questions (BMAT section 1, P=.3; BMAT section 2, P=.04; TMUA paper 1, P<.001; TMUA paper 2, P=.003; TSA section 1, P=.8; and LNAT papers 1 and 2, section A, P>.99) and hard to challenging ones (BMAT section 1, P=.7; BMAT section 2, P<.001; TMUA paper 1, P=.007; TMUA paper 2, P<.001; TSA section 1, P=.3; and LNAT papers 1 and 2, section A, P=.2). ConclusionsChatGPT shows promise as a supplementary tool for subject areas and test formats that assess aptitude, problem-solving and critical thinking, and reading comprehension. However, its limitations in areas such as scientific and mathematical knowledge and applications highlight the need for continuous development and integration with conventional learning strategies in order to fully harness its potential.
first_indexed 2024-03-12T12:39:15Z
format Article
id doaj.art-4ccc00c10f70418085ef07eb150c8e6f
institution Directory Open Access Journal
issn 2369-3762
language English
last_indexed 2024-03-12T12:39:15Z
publishDate 2023-04-01
publisher JMIR Publications
record_format Article
series JMIR Medical Education
spelling doaj.art-4ccc00c10f70418085ef07eb150c8e6f2023-08-28T23:57:47ZengJMIR PublicationsJMIR Medical Education2369-37622023-04-019e4773710.2196/47737Performance of ChatGPT on UK Standardized Admission Tests: Insights From the BMAT, TMUA, LNAT, and TSA ExaminationsPanagiotis Giannoshttps://orcid.org/0000-0003-1037-1983Orestis Delardashttps://orcid.org/0000-0003-2835-0400 BackgroundLarge language models, such as ChatGPT by OpenAI, have demonstrated potential in various applications, including medical education. Previous studies have assessed ChatGPT’s performance in university or professional settings. However, the model’s potential in the context of standardized admission tests remains unexplored. ObjectiveThis study evaluated ChatGPT’s performance on standardized admission tests in the United Kingdom, including the BioMedical Admissions Test (BMAT), Test of Mathematics for University Admission (TMUA), Law National Aptitude Test (LNAT), and Thinking Skills Assessment (TSA), to understand its potential as an innovative tool for education and test preparation. MethodsRecent public resources (2019-2022) were used to compile a data set of 509 questions from the BMAT, TMUA, LNAT, and TSA covering diverse topics in aptitude, scientific knowledge and applications, mathematical thinking and reasoning, critical thinking, problem-solving, reading comprehension, and logical reasoning. This evaluation assessed ChatGPT’s performance using the legacy GPT-3.5 model, focusing on multiple-choice questions for consistency. The model’s performance was analyzed based on question difficulty, the proportion of correct responses when aggregating exams from all years, and a comparison of test scores between papers of the same exam using binomial distribution and paired-sample (2-tailed) t tests. ResultsThe proportion of correct responses was significantly lower than incorrect ones in BMAT section 2 (P<.001) and TMUA paper 1 (P<.001) and paper 2 (P<.001). No significant differences were observed in BMAT section 1 (P=.2), TSA section 1 (P=.7), or LNAT papers 1 and 2, section A (P=.3). ChatGPT performed better in BMAT section 1 than section 2 (P=.047), with a maximum candidate ranking of 73% compared to a minimum of 1%. In the TMUA, it engaged with questions but had limited accuracy and no performance difference between papers (P=.6), with candidate rankings below 10%. In the LNAT, it demonstrated moderate success, especially in paper 2’s questions; however, student performance data were unavailable. TSA performance varied across years with generally moderate results and fluctuating candidate rankings. Similar trends were observed for easy to moderate difficulty questions (BMAT section 1, P=.3; BMAT section 2, P=.04; TMUA paper 1, P<.001; TMUA paper 2, P=.003; TSA section 1, P=.8; and LNAT papers 1 and 2, section A, P>.99) and hard to challenging ones (BMAT section 1, P=.7; BMAT section 2, P<.001; TMUA paper 1, P=.007; TMUA paper 2, P<.001; TSA section 1, P=.3; and LNAT papers 1 and 2, section A, P=.2). ConclusionsChatGPT shows promise as a supplementary tool for subject areas and test formats that assess aptitude, problem-solving and critical thinking, and reading comprehension. However, its limitations in areas such as scientific and mathematical knowledge and applications highlight the need for continuous development and integration with conventional learning strategies in order to fully harness its potential.https://mededu.jmir.org/2023/1/e47737
spellingShingle Panagiotis Giannos
Orestis Delardas
Performance of ChatGPT on UK Standardized Admission Tests: Insights From the BMAT, TMUA, LNAT, and TSA Examinations
JMIR Medical Education
title Performance of ChatGPT on UK Standardized Admission Tests: Insights From the BMAT, TMUA, LNAT, and TSA Examinations
title_full Performance of ChatGPT on UK Standardized Admission Tests: Insights From the BMAT, TMUA, LNAT, and TSA Examinations
title_fullStr Performance of ChatGPT on UK Standardized Admission Tests: Insights From the BMAT, TMUA, LNAT, and TSA Examinations
title_full_unstemmed Performance of ChatGPT on UK Standardized Admission Tests: Insights From the BMAT, TMUA, LNAT, and TSA Examinations
title_short Performance of ChatGPT on UK Standardized Admission Tests: Insights From the BMAT, TMUA, LNAT, and TSA Examinations
title_sort performance of chatgpt on uk standardized admission tests insights from the bmat tmua lnat and tsa examinations
url https://mededu.jmir.org/2023/1/e47737
work_keys_str_mv AT panagiotisgiannos performanceofchatgptonukstandardizedadmissiontestsinsightsfromthebmattmualnatandtsaexaminations
AT orestisdelardas performanceofchatgptonukstandardizedadmissiontestsinsightsfromthebmattmualnatandtsaexaminations