Performance of Polytomous IRT Models With Rating Scale Data: An Investigation Over Sample Size, Instrument Length, and Missing Data

The implementation of polytomous item response theory (IRT) models such as the graded response model (GRM) and the generalized partial credit model (GPCM) to inform instrument design and validation has been increasing across social and educational contexts where rating scales are usually used. The p...

Full description

Bibliographic Details
Main Authors: Shenghai Dai, Thao Thu Vo, Olasunkanmi James Kehinde, Haixia He, Yu Xue, Cihan Demir, Xiaolin Wang
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-09-01
Series:Frontiers in Education
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/feduc.2021.721963/full
_version_ 1819135458246918144
author Shenghai Dai
Thao Thu Vo
Olasunkanmi James Kehinde
Haixia He
Yu Xue
Cihan Demir
Xiaolin Wang
author_facet Shenghai Dai
Thao Thu Vo
Olasunkanmi James Kehinde
Haixia He
Yu Xue
Cihan Demir
Xiaolin Wang
author_sort Shenghai Dai
collection DOAJ
description The implementation of polytomous item response theory (IRT) models such as the graded response model (GRM) and the generalized partial credit model (GPCM) to inform instrument design and validation has been increasing across social and educational contexts where rating scales are usually used. The performance of such models has not been fully investigated and compared across conditions with common survey-specific characteristics such as short test length, small sample size, and data missingness. The purpose of the current simulation study is to inform the literature and guide the implementation of GRM and GPCM under these conditions. For item parameter estimations, results suggest a sample size of at least 300 and/or an instrument length of at least five items for both models. The performance of GPCM is stable across instrument lengths while that of GRM improves notably as the instrument length increases. For person parameters, GRM reveals more accurate estimates when the proportion of missing data is small, whereas GPCM is favored in the presence of a large amount of missingness. Further, it is not recommended to compare GRM and GPCM based on test information. Relative model fit indices (AIC, BIC, LL) might not be powerful when the sample size is less than 300 and the length is less than 5. Synthesis of the patterns of the results, as well as recommendations for the implementation of polytomous IRT models, are presented and discussed.
first_indexed 2024-12-22T10:19:24Z
format Article
id doaj.art-abe60159314049eca030eab36c7176b9
institution Directory Open Access Journal
issn 2504-284X
language English
last_indexed 2024-12-22T10:19:24Z
publishDate 2021-09-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Education
spelling doaj.art-abe60159314049eca030eab36c7176b92022-12-21T18:29:39ZengFrontiers Media S.A.Frontiers in Education2504-284X2021-09-01610.3389/feduc.2021.721963721963Performance of Polytomous IRT Models With Rating Scale Data: An Investigation Over Sample Size, Instrument Length, and Missing DataShenghai Dai0Thao Thu Vo1Olasunkanmi James Kehinde2Haixia He3Yu Xue4Cihan Demir5Xiaolin Wang6Department of Kinesiology and Educational Psychology, Washington State University, Pullman, WA, United StatesDepartment of Kinesiology and Educational Psychology, Washington State University, Pullman, WA, United StatesDepartment of Kinesiology and Educational Psychology, Washington State University, Pullman, WA, United StatesDepartment of Teaching and Learning, Washington State University, Pullman, WA, United StatesDepartment of Kinesiology and Educational Psychology, Washington State University, Pullman, WA, United StatesDepartment of Kinesiology and Educational Psychology, Washington State University, Pullman, WA, United StatesPearson VUE, Bloomington, MN, United StatesThe implementation of polytomous item response theory (IRT) models such as the graded response model (GRM) and the generalized partial credit model (GPCM) to inform instrument design and validation has been increasing across social and educational contexts where rating scales are usually used. The performance of such models has not been fully investigated and compared across conditions with common survey-specific characteristics such as short test length, small sample size, and data missingness. The purpose of the current simulation study is to inform the literature and guide the implementation of GRM and GPCM under these conditions. For item parameter estimations, results suggest a sample size of at least 300 and/or an instrument length of at least five items for both models. The performance of GPCM is stable across instrument lengths while that of GRM improves notably as the instrument length increases. For person parameters, GRM reveals more accurate estimates when the proportion of missing data is small, whereas GPCM is favored in the presence of a large amount of missingness. Further, it is not recommended to compare GRM and GPCM based on test information. Relative model fit indices (AIC, BIC, LL) might not be powerful when the sample size is less than 300 and the length is less than 5. Synthesis of the patterns of the results, as well as recommendations for the implementation of polytomous IRT models, are presented and discussed.https://www.frontiersin.org/articles/10.3389/feduc.2021.721963/fullIRTGRMGPCMsample sizeinstrument lengthmissing data
spellingShingle Shenghai Dai
Thao Thu Vo
Olasunkanmi James Kehinde
Haixia He
Yu Xue
Cihan Demir
Xiaolin Wang
Performance of Polytomous IRT Models With Rating Scale Data: An Investigation Over Sample Size, Instrument Length, and Missing Data
Frontiers in Education
IRT
GRM
GPCM
sample size
instrument length
missing data
title Performance of Polytomous IRT Models With Rating Scale Data: An Investigation Over Sample Size, Instrument Length, and Missing Data
title_full Performance of Polytomous IRT Models With Rating Scale Data: An Investigation Over Sample Size, Instrument Length, and Missing Data
title_fullStr Performance of Polytomous IRT Models With Rating Scale Data: An Investigation Over Sample Size, Instrument Length, and Missing Data
title_full_unstemmed Performance of Polytomous IRT Models With Rating Scale Data: An Investigation Over Sample Size, Instrument Length, and Missing Data
title_short Performance of Polytomous IRT Models With Rating Scale Data: An Investigation Over Sample Size, Instrument Length, and Missing Data
title_sort performance of polytomous irt models with rating scale data an investigation over sample size instrument length and missing data
topic IRT
GRM
GPCM
sample size
instrument length
missing data
url https://www.frontiersin.org/articles/10.3389/feduc.2021.721963/full
work_keys_str_mv AT shenghaidai performanceofpolytomousirtmodelswithratingscaledataaninvestigationoversamplesizeinstrumentlengthandmissingdata
AT thaothuvo performanceofpolytomousirtmodelswithratingscaledataaninvestigationoversamplesizeinstrumentlengthandmissingdata
AT olasunkanmijameskehinde performanceofpolytomousirtmodelswithratingscaledataaninvestigationoversamplesizeinstrumentlengthandmissingdata
AT haixiahe performanceofpolytomousirtmodelswithratingscaledataaninvestigationoversamplesizeinstrumentlengthandmissingdata
AT yuxue performanceofpolytomousirtmodelswithratingscaledataaninvestigationoversamplesizeinstrumentlengthandmissingdata
AT cihandemir performanceofpolytomousirtmodelswithratingscaledataaninvestigationoversamplesizeinstrumentlengthandmissingdata
AT xiaolinwang performanceofpolytomousirtmodelswithratingscaledataaninvestigationoversamplesizeinstrumentlengthandmissingdata