Comparability of science assessment across languages: the case of PISA science 2006

<p>In this research, I investigated the extent to which language versions (English, French and Arabic) of the same science test were comparable in terms of item difficulty and demands. I used PISA science 2006 data from three countries (respectively, UK, France and Jordan). I argued that langu...

Full description

Bibliographic Details
Main Author: Masri, Y
Other Authors: Baird, J
Format: Thesis
Language:English
Published: 2015
Subjects:
_version_ 1826304100644421632
author Masri, Y
author2 Baird, J
author_facet Baird, J
Masri, Y
author_sort Masri, Y
collection OXFORD
description <p>In this research, I investigated the extent to which language versions (English, French and Arabic) of the same science test were comparable in terms of item difficulty and demands. I used PISA science 2006 data from three countries (respectively, UK, France and Jordan). I argued that language was an intrinsic part of the scientific literacy construct, be it intended or not by the examiner. The tight relationship between the language element and the scientific knowledge makes the language variable inextricable from the construct. This argument has considerable implications on methodologies used to address this question. I also argued that none of the available statistical or qualitative techniques were capable of teasing out the language variable and answering the research question.</p> <p>In this thesis, I adopted a critical evaluation and empirical methods, using literature from various fields (cognitive linguistics, psychology, measurement and science education) to analyse the test development and design procedures. In addition, I illustrated my claims with evidence from the technical reports and examples of released items. I adopted the same class of models employed in PISA, the Rasch model, as well as differential item functioning (DIF) techniques to address my question empirically.</p> <p>General tests of fit suggested an overall good fit of the data to the model with eleven items out of 103 showing strong evidence of misfit. Various violations to the requirements of the Rasch model were highlighted. The DIF analysis indicated that 22% of the items showed bias in the selected countries, but bias balanced out at test level. Limitations of the DIF analysis to identify the source of bias were discussed. Qualitative approaches to investigating question demands were examined and issues with their usefulness in international settings were discussed. A way forward incorporating cognitive load theory and computational linguistics is proposed.</p>
first_indexed 2024-03-07T06:12:41Z
format Thesis
id oxford-uuid:f0115f5f-4642-43b5-a3e3-b4dd0d8e324a
institution University of Oxford
language English
last_indexed 2024-03-07T06:12:41Z
publishDate 2015
record_format dspace
spelling oxford-uuid:f0115f5f-4642-43b5-a3e3-b4dd0d8e324a2022-03-27T11:44:53ZComparability of science assessment across languages: the case of PISA science 2006Thesishttp://purl.org/coar/resource_type/c_db06uuid:f0115f5f-4642-43b5-a3e3-b4dd0d8e324aEducational AssessmentEnglishORA Deposit2015Masri, YBaird, JHopfenbeck, TMcNicholl, JBéguin, A<p>In this research, I investigated the extent to which language versions (English, French and Arabic) of the same science test were comparable in terms of item difficulty and demands. I used PISA science 2006 data from three countries (respectively, UK, France and Jordan). I argued that language was an intrinsic part of the scientific literacy construct, be it intended or not by the examiner. The tight relationship between the language element and the scientific knowledge makes the language variable inextricable from the construct. This argument has considerable implications on methodologies used to address this question. I also argued that none of the available statistical or qualitative techniques were capable of teasing out the language variable and answering the research question.</p> <p>In this thesis, I adopted a critical evaluation and empirical methods, using literature from various fields (cognitive linguistics, psychology, measurement and science education) to analyse the test development and design procedures. In addition, I illustrated my claims with evidence from the technical reports and examples of released items. I adopted the same class of models employed in PISA, the Rasch model, as well as differential item functioning (DIF) techniques to address my question empirically.</p> <p>General tests of fit suggested an overall good fit of the data to the model with eleven items out of 103 showing strong evidence of misfit. Various violations to the requirements of the Rasch model were highlighted. The DIF analysis indicated that 22% of the items showed bias in the selected countries, but bias balanced out at test level. Limitations of the DIF analysis to identify the source of bias were discussed. Qualitative approaches to investigating question demands were examined and issues with their usefulness in international settings were discussed. A way forward incorporating cognitive load theory and computational linguistics is proposed.</p>
spellingShingle Educational Assessment
Masri, Y
Comparability of science assessment across languages: the case of PISA science 2006
title Comparability of science assessment across languages: the case of PISA science 2006
title_full Comparability of science assessment across languages: the case of PISA science 2006
title_fullStr Comparability of science assessment across languages: the case of PISA science 2006
title_full_unstemmed Comparability of science assessment across languages: the case of PISA science 2006
title_short Comparability of science assessment across languages: the case of PISA science 2006
title_sort comparability of science assessment across languages the case of pisa science 2006
topic Educational Assessment
work_keys_str_mv AT masriy comparabilityofscienceassessmentacrosslanguagesthecaseofpisascience2006