Automated Item Generation: impact of item variants on performance and standard setting

Abstract Background Automated Item Generation (AIG) uses computer software to create multiple items from a single question model. There is currently a lack of data looking at whether item variants to a single question result in differences in student performance or human-derived standard setting. Th...

Full description

Bibliographic Details
Main Authors:	R. Westacott, K. Badger, D. Kluth, M. Gurnell, M. W. R. Reed, A. H. Sam
Format:	Article
Language:	English
Published:	BMC 2023-09-01
Series:	BMC Medical Education
Subjects:	Assessment Automated item generation Multiple choice questions Standard setting
Online Access:	https://doi.org/10.1186/s12909-023-04457-0

_version_	1797452239775727616
author	R. Westacott K. Badger D. Kluth M. Gurnell M. W. R. Reed A. H. Sam
author_facet	R. Westacott K. Badger D. Kluth M. Gurnell M. W. R. Reed A. H. Sam
author_sort	R. Westacott
collection	DOAJ
description	Abstract Background Automated Item Generation (AIG) uses computer software to create multiple items from a single question model. There is currently a lack of data looking at whether item variants to a single question result in differences in student performance or human-derived standard setting. The purpose of this study was to use 50 Multiple Choice Questions (MCQs) as models to create four distinct tests which would be standard set and given to final year UK medical students, and then to compare the performance and standard setting data for each. Methods Pre-existing questions from the UK Medical Schools Council (MSC) Assessment Alliance item bank, created using traditional item writing techniques, were used to generate four ‘isomorphic’ 50-item MCQ tests using AIG software. Isomorphic questions use the same question template with minor alterations to test the same learning outcome. All UK medical schools were invited to deliver one of the four papers as an online formative assessment for their final year students. Each test was standard set using a modified Angoff method. Thematic analysis was conducted for item variants with high and low levels of variance in facility (for student performance) and average scores (for standard setting). Results Two thousand two hundred eighteen students from 12 UK medical schools participated, with each school using one of the four papers. The average facility of the four papers ranged from 0.55–0.61, and the cut score ranged from 0.58–0.61. Twenty item models had a facility difference > 0.15 and 10 item models had a difference in standard setting of > 0.1. Variation in parameters that could alter clinical reasoning strategies had the greatest impact on item facility. Conclusions Item facility varied to a greater extent than the standard set. This difference may relate to variants causing greater disruption of clinical reasoning strategies in novice learners compared to experts, but is confounded by the possibility that the performance differences may be explained at school level and therefore warrants further study.
first_indexed	2024-03-09T15:05:55Z
format	Article
id	doaj.art-6d5cc55853ce4df9aba79dfb5a31428d
institution	Directory Open Access Journal
issn	1472-6920
language	English
last_indexed	2024-03-09T15:05:55Z
publishDate	2023-09-01
publisher	BMC
record_format	Article
series	BMC Medical Education
spelling	doaj.art-6d5cc55853ce4df9aba79dfb5a31428d2023-11-26T13:40:26ZengBMCBMC Medical Education1472-69202023-09-0123111310.1186/s12909-023-04457-0Automated Item Generation: impact of item variants on performance and standard settingR. Westacott0K. Badger1D. Kluth2M. Gurnell3M. W. R. Reed4A. H. Sam5Birmingham Medical School, University of BirminghamImperial College School of Medicine, Imperial College LondonEdinburgh Medical School, The University of EdinburghWellcome–MRC Institute of Metabolic Science, University of Cambridge and NIHR Cambridge Biomedical Research Centre, Cambridge University HospitalsBrighton and Sussex Medical School, University of SussexImperial College School of Medicine, Imperial College LondonAbstract Background Automated Item Generation (AIG) uses computer software to create multiple items from a single question model. There is currently a lack of data looking at whether item variants to a single question result in differences in student performance or human-derived standard setting. The purpose of this study was to use 50 Multiple Choice Questions (MCQs) as models to create four distinct tests which would be standard set and given to final year UK medical students, and then to compare the performance and standard setting data for each. Methods Pre-existing questions from the UK Medical Schools Council (MSC) Assessment Alliance item bank, created using traditional item writing techniques, were used to generate four ‘isomorphic’ 50-item MCQ tests using AIG software. Isomorphic questions use the same question template with minor alterations to test the same learning outcome. All UK medical schools were invited to deliver one of the four papers as an online formative assessment for their final year students. Each test was standard set using a modified Angoff method. Thematic analysis was conducted for item variants with high and low levels of variance in facility (for student performance) and average scores (for standard setting). Results Two thousand two hundred eighteen students from 12 UK medical schools participated, with each school using one of the four papers. The average facility of the four papers ranged from 0.55–0.61, and the cut score ranged from 0.58–0.61. Twenty item models had a facility difference > 0.15 and 10 item models had a difference in standard setting of > 0.1. Variation in parameters that could alter clinical reasoning strategies had the greatest impact on item facility. Conclusions Item facility varied to a greater extent than the standard set. This difference may relate to variants causing greater disruption of clinical reasoning strategies in novice learners compared to experts, but is confounded by the possibility that the performance differences may be explained at school level and therefore warrants further study.https://doi.org/10.1186/s12909-023-04457-0AssessmentAutomated item generationMultiple choice questionsStandard setting
spellingShingle	R. Westacott K. Badger D. Kluth M. Gurnell M. W. R. Reed A. H. Sam Automated Item Generation: impact of item variants on performance and standard setting BMC Medical Education Assessment Automated item generation Multiple choice questions Standard setting
title	Automated Item Generation: impact of item variants on performance and standard setting
title_full	Automated Item Generation: impact of item variants on performance and standard setting
title_fullStr	Automated Item Generation: impact of item variants on performance and standard setting
title_full_unstemmed	Automated Item Generation: impact of item variants on performance and standard setting
title_short	Automated Item Generation: impact of item variants on performance and standard setting
title_sort	automated item generation impact of item variants on performance and standard setting
topic	Assessment Automated item generation Multiple choice questions Standard setting
url	https://doi.org/10.1186/s12909-023-04457-0
work_keys_str_mv	AT rwestacott automateditemgenerationimpactofitemvariantsonperformanceandstandardsetting AT kbadger automateditemgenerationimpactofitemvariantsonperformanceandstandardsetting AT dkluth automateditemgenerationimpactofitemvariantsonperformanceandstandardsetting AT mgurnell automateditemgenerationimpactofitemvariantsonperformanceandstandardsetting AT mwrreed automateditemgenerationimpactofitemvariantsonperformanceandstandardsetting AT ahsam automateditemgenerationimpactofitemvariantsonperformanceandstandardsetting

Automated Item Generation: impact of item variants on performance and standard setting

Similar Items