The standard setting process: validating interpretations of stakeholders

Abstract Background Stakeholders’ interpretations of the findings of large-scale educational assessments can influence important decisions. In the context of educational assessment, standard-setting remains an especially critical element, because it is complex and largely unstandardized. Instruments...

Full description

Bibliographic Details
Main Authors: Nele Kampa, Helene Wagner, Olaf Köller
Format: Article
Language:English
Published: SpringerOpen 2019-02-01
Series:Large-scale Assessments in Education
Subjects:
Online Access:http://link.springer.com/article/10.1186/s40536-019-0071-8
_version_ 1818302590528520192
author Nele Kampa
Helene Wagner
Olaf Köller
author_facet Nele Kampa
Helene Wagner
Olaf Köller
author_sort Nele Kampa
collection DOAJ
description Abstract Background Stakeholders’ interpretations of the findings of large-scale educational assessments can influence important decisions. In the context of educational assessment, standard-setting remains an especially critical element, because it is complex and largely unstandardized. Instruments established by means of standard-setting procedures such as proficiency levels (PL) therefore appear to be arbitrary to some degree. Owing to the significance such results take on, when they are communicated to stakeholders or the public, a thorough validation of this process seems crucial. In our study, ministry stakeholders intended to use PL established in an assessment of science abilities to obtain information about students’ strengths and weaknesses regarding science abilities in general and specifically about the extent to which they were prepared for future science studies. The aim of our study was to investigate the validity arguments regarding these two intended interpretations. Methods Based on a university science test administered to 3641 upper secondary students (Grade 13), a panel of nine experts set four cut scores using two variations of the Angoff method, the Yes/No Angoff method (multiple choice items) and the extended Angoff method (complex multiple choice items). We carried out t-tests, repeated measures ANOVA, G-studies and regression analyses to support the procedural, internal, external, and consequential validity elements regarding the aforementioned interpretations of the cut scores. Results Our t-tests and G-studies showed that the intended use of the cut scores was valid regarding procedural and internal aspects of validity. These findings were called into question by the experts’ lack of confidence in the established cut scores. Regression analyses including number of lessons taught and intended and pursued science-related studies showed good external and poor consequential validity. Conclusion The cut scores can be used as an indicator of 13th graders’ strengths and weaknesses in science. They should not be used as an indicator for preparedness for science university studies. Since assessment formats are continually evolving and consequently leading to more complex designs, further research needs to be conducted on the application of new standard-setting methods to meet the challenges arising from this development.
first_indexed 2024-12-13T05:41:20Z
format Article
id doaj.art-a2ccc842797945b7b154e696bf91e541
institution Directory Open Access Journal
issn 2196-0739
language English
last_indexed 2024-12-13T05:41:20Z
publishDate 2019-02-01
publisher SpringerOpen
record_format Article
series Large-scale Assessments in Education
spelling doaj.art-a2ccc842797945b7b154e696bf91e5412022-12-21T23:57:47ZengSpringerOpenLarge-scale Assessments in Education2196-07392019-02-017112510.1186/s40536-019-0071-8The standard setting process: validating interpretations of stakeholdersNele Kampa0Helene Wagner1Olaf Köller2Leibniz Institute for Science and Mathematics Education at the Christian-Albrechts-University of KielLeibniz Institute for Science and Mathematics Education at the Christian-Albrechts-University of KielLeibniz Institute for Science and Mathematics Education at the Christian-Albrechts-University of KielAbstract Background Stakeholders’ interpretations of the findings of large-scale educational assessments can influence important decisions. In the context of educational assessment, standard-setting remains an especially critical element, because it is complex and largely unstandardized. Instruments established by means of standard-setting procedures such as proficiency levels (PL) therefore appear to be arbitrary to some degree. Owing to the significance such results take on, when they are communicated to stakeholders or the public, a thorough validation of this process seems crucial. In our study, ministry stakeholders intended to use PL established in an assessment of science abilities to obtain information about students’ strengths and weaknesses regarding science abilities in general and specifically about the extent to which they were prepared for future science studies. The aim of our study was to investigate the validity arguments regarding these two intended interpretations. Methods Based on a university science test administered to 3641 upper secondary students (Grade 13), a panel of nine experts set four cut scores using two variations of the Angoff method, the Yes/No Angoff method (multiple choice items) and the extended Angoff method (complex multiple choice items). We carried out t-tests, repeated measures ANOVA, G-studies and regression analyses to support the procedural, internal, external, and consequential validity elements regarding the aforementioned interpretations of the cut scores. Results Our t-tests and G-studies showed that the intended use of the cut scores was valid regarding procedural and internal aspects of validity. These findings were called into question by the experts’ lack of confidence in the established cut scores. Regression analyses including number of lessons taught and intended and pursued science-related studies showed good external and poor consequential validity. Conclusion The cut scores can be used as an indicator of 13th graders’ strengths and weaknesses in science. They should not be used as an indicator for preparedness for science university studies. Since assessment formats are continually evolving and consequently leading to more complex designs, further research needs to be conducted on the application of new standard-setting methods to meet the challenges arising from this development.http://link.springer.com/article/10.1186/s40536-019-0071-8Standard settingValidityScience educationExtended Angoff methodYes/No Angoff methodLarge-scale assessment
spellingShingle Nele Kampa
Helene Wagner
Olaf Köller
The standard setting process: validating interpretations of stakeholders
Large-scale Assessments in Education
Standard setting
Validity
Science education
Extended Angoff method
Yes/No Angoff method
Large-scale assessment
title The standard setting process: validating interpretations of stakeholders
title_full The standard setting process: validating interpretations of stakeholders
title_fullStr The standard setting process: validating interpretations of stakeholders
title_full_unstemmed The standard setting process: validating interpretations of stakeholders
title_short The standard setting process: validating interpretations of stakeholders
title_sort standard setting process validating interpretations of stakeholders
topic Standard setting
Validity
Science education
Extended Angoff method
Yes/No Angoff method
Large-scale assessment
url http://link.springer.com/article/10.1186/s40536-019-0071-8
work_keys_str_mv AT nelekampa thestandardsettingprocessvalidatinginterpretationsofstakeholders
AT helenewagner thestandardsettingprocessvalidatinginterpretationsofstakeholders
AT olafkoller thestandardsettingprocessvalidatinginterpretationsofstakeholders
AT nelekampa standardsettingprocessvalidatinginterpretationsofstakeholders
AT helenewagner standardsettingprocessvalidatinginterpretationsofstakeholders
AT olafkoller standardsettingprocessvalidatinginterpretationsofstakeholders