Combining machine translation and automated scoring in international large-scale assessments

Abstract Background Artificial intelligence (AI) is rapidly changing communication and technology-driven content creation and is also being used more frequently in education. Despite these advancements, AI-powered automated scoring in international large-scale assessments (ILSAs) remains largely une...

Full description

Bibliographic Details
Main Authors: Ji Yoon Jung, Lillian Tyack, Matthias von Davier
Format: Article
Language:English
Published: SpringerOpen 2024-04-01
Series:Large-scale Assessments in Education
Subjects:
Online Access:https://doi.org/10.1186/s40536-024-00199-7
_version_ 1797209193609953280
author Ji Yoon Jung
Lillian Tyack
Matthias von Davier
author_facet Ji Yoon Jung
Lillian Tyack
Matthias von Davier
author_sort Ji Yoon Jung
collection DOAJ
description Abstract Background Artificial intelligence (AI) is rapidly changing communication and technology-driven content creation and is also being used more frequently in education. Despite these advancements, AI-powered automated scoring in international large-scale assessments (ILSAs) remains largely unexplored due to the scoring challenges associated with processing large amounts of multilingual responses. However, due to their low-stakes nature, ILSAs are an ideal ground for innovations and exploring new methodologies. Methods This study proposes combining state-of-the-art machine translations (i.e., Google Translate & ChatGPT) and artificial neural networks (ANNs) to mitigate two key concerns of human scoring: inconsistency and high expense. We applied AI-based automated scoring to multilingual student responses from eight countries and six different languages, using six constructed response items from TIMSS 2019. Results Automated scoring displayed comparable performance to human scoring, especially when the ANNs were trained and tested on ChatGPT-translated responses. Furthermore, psychometric characteristics derived from machine scores generally exhibited similarity to those obtained from human scores. These results can be considered as supportive evidence for the validity of automated scoring for survey assessments. Conclusions This study highlights that automated scoring integrated with the recent machine translation holds great promise for consistent and resource-efficient scoring in ILSAs.
first_indexed 2024-04-24T09:50:49Z
format Article
id doaj.art-2113ecada6eb4a9f939a858b538f1e1c
institution Directory Open Access Journal
issn 2196-0739
language English
last_indexed 2024-04-24T09:50:49Z
publishDate 2024-04-01
publisher SpringerOpen
record_format Article
series Large-scale Assessments in Education
spelling doaj.art-2113ecada6eb4a9f939a858b538f1e1c2024-04-14T11:23:19ZengSpringerOpenLarge-scale Assessments in Education2196-07392024-04-0112111810.1186/s40536-024-00199-7Combining machine translation and automated scoring in international large-scale assessmentsJi Yoon Jung0Lillian Tyack1Matthias von Davier2Boston College, TIMSS & PIRLS International Study CenterBoston College, TIMSS & PIRLS International Study CenterBoston College, TIMSS & PIRLS International Study CenterAbstract Background Artificial intelligence (AI) is rapidly changing communication and technology-driven content creation and is also being used more frequently in education. Despite these advancements, AI-powered automated scoring in international large-scale assessments (ILSAs) remains largely unexplored due to the scoring challenges associated with processing large amounts of multilingual responses. However, due to their low-stakes nature, ILSAs are an ideal ground for innovations and exploring new methodologies. Methods This study proposes combining state-of-the-art machine translations (i.e., Google Translate & ChatGPT) and artificial neural networks (ANNs) to mitigate two key concerns of human scoring: inconsistency and high expense. We applied AI-based automated scoring to multilingual student responses from eight countries and six different languages, using six constructed response items from TIMSS 2019. Results Automated scoring displayed comparable performance to human scoring, especially when the ANNs were trained and tested on ChatGPT-translated responses. Furthermore, psychometric characteristics derived from machine scores generally exhibited similarity to those obtained from human scores. These results can be considered as supportive evidence for the validity of automated scoring for survey assessments. Conclusions This study highlights that automated scoring integrated with the recent machine translation holds great promise for consistent and resource-efficient scoring in ILSAs.https://doi.org/10.1186/s40536-024-00199-7Automated scoringArtificial intelligenceArtificial neural networksMachine translationGoogle translateChatGPT
spellingShingle Ji Yoon Jung
Lillian Tyack
Matthias von Davier
Combining machine translation and automated scoring in international large-scale assessments
Large-scale Assessments in Education
Automated scoring
Artificial intelligence
Artificial neural networks
Machine translation
Google translate
ChatGPT
title Combining machine translation and automated scoring in international large-scale assessments
title_full Combining machine translation and automated scoring in international large-scale assessments
title_fullStr Combining machine translation and automated scoring in international large-scale assessments
title_full_unstemmed Combining machine translation and automated scoring in international large-scale assessments
title_short Combining machine translation and automated scoring in international large-scale assessments
title_sort combining machine translation and automated scoring in international large scale assessments
topic Automated scoring
Artificial intelligence
Artificial neural networks
Machine translation
Google translate
ChatGPT
url https://doi.org/10.1186/s40536-024-00199-7
work_keys_str_mv AT jiyoonjung combiningmachinetranslationandautomatedscoringininternationallargescaleassessments
AT lilliantyack combiningmachinetranslationandautomatedscoringininternationallargescaleassessments
AT matthiasvondavier combiningmachinetranslationandautomatedscoringininternationallargescaleassessments