Investigating the affordances of OpenAI's large language model in developing listening assessments

To address the complexity and high costs of developing listening tests for test-takers of varying proficiency levels, this study investigates the capabilities of an OpenAI's large language model, ChatGPT 4, in developing listening assessments. Employing prompt engineering and fine-tuning of pro...

Full description

Bibliographic Details
Main Authors: Vahid Aryadoust, Azrifah Zakaria, Yichen Jia
Format: Article
Language:English
Published: Elsevier 2024-06-01
Series:Computers and Education: Artificial Intelligence
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666920X24000055
_version_ 1797350864777641984
author Vahid Aryadoust
Azrifah Zakaria
Yichen Jia
author_facet Vahid Aryadoust
Azrifah Zakaria
Yichen Jia
author_sort Vahid Aryadoust
collection DOAJ
description To address the complexity and high costs of developing listening tests for test-takers of varying proficiency levels, this study investigates the capabilities of an OpenAI's large language model, ChatGPT 4, in developing listening assessments. Employing prompt engineering and fine-tuning of prompts, the study specifically focuses on creating listening scripts and test items using ChatGPT 4 for test-takers across a spectrum of proficiency levels (academic, low, intermediate, and advanced). For comparability, the 24 topics of these scripts were selected from topics found in academic listening tests. We conducted two types of analyses to evaluate the quality of the output. First, we performed linguistic analyses of the scripts using Coh-Metrix and Text Inspector to determine if the scripts varied linguistically as required by the prompts. Second, we analyzed topic variation and the degree of overlap in the test items. Results indicated that while ChatGPT 4 reliably produced scripts with significant textual variations, the test items generated were often long and exhibited semantic overlaps among options. This effect was also influenced by the topic. We discuss the ethical complexities that arise from the use of generative artificial intelligence (AI), and how generative AI (GenAI) can potentially benefit practitioners and researchers in language assessment, while recognizing its limitations.
first_indexed 2024-03-08T12:51:17Z
format Article
id doaj.art-3cbb9e940ffd415392bd8e97b08c8f8e
institution Directory Open Access Journal
issn 2666-920X
language English
last_indexed 2024-03-08T12:51:17Z
publishDate 2024-06-01
publisher Elsevier
record_format Article
series Computers and Education: Artificial Intelligence
spelling doaj.art-3cbb9e940ffd415392bd8e97b08c8f8e2024-01-20T04:46:47ZengElsevierComputers and Education: Artificial Intelligence2666-920X2024-06-016100204Investigating the affordances of OpenAI's large language model in developing listening assessmentsVahid Aryadoust0Azrifah Zakaria1Yichen Jia2Corresponding author.; National Institute of Education, Nanyang Technological University, SingaporeNational Institute of Education, Nanyang Technological University, SingaporeNational Institute of Education, Nanyang Technological University, SingaporeTo address the complexity and high costs of developing listening tests for test-takers of varying proficiency levels, this study investigates the capabilities of an OpenAI's large language model, ChatGPT 4, in developing listening assessments. Employing prompt engineering and fine-tuning of prompts, the study specifically focuses on creating listening scripts and test items using ChatGPT 4 for test-takers across a spectrum of proficiency levels (academic, low, intermediate, and advanced). For comparability, the 24 topics of these scripts were selected from topics found in academic listening tests. We conducted two types of analyses to evaluate the quality of the output. First, we performed linguistic analyses of the scripts using Coh-Metrix and Text Inspector to determine if the scripts varied linguistically as required by the prompts. Second, we analyzed topic variation and the degree of overlap in the test items. Results indicated that while ChatGPT 4 reliably produced scripts with significant textual variations, the test items generated were often long and exhibited semantic overlaps among options. This effect was also influenced by the topic. We discuss the ethical complexities that arise from the use of generative artificial intelligence (AI), and how generative AI (GenAI) can potentially benefit practitioners and researchers in language assessment, while recognizing its limitations.http://www.sciencedirect.com/science/article/pii/S2666920X24000055Artificial intelligence (AI)ChatGPT 4Fine-tuning of promptsGenerative AI (GenAI)Large language modelListening assessment
spellingShingle Vahid Aryadoust
Azrifah Zakaria
Yichen Jia
Investigating the affordances of OpenAI's large language model in developing listening assessments
Computers and Education: Artificial Intelligence
Artificial intelligence (AI)
ChatGPT 4
Fine-tuning of prompts
Generative AI (GenAI)
Large language model
Listening assessment
title Investigating the affordances of OpenAI's large language model in developing listening assessments
title_full Investigating the affordances of OpenAI's large language model in developing listening assessments
title_fullStr Investigating the affordances of OpenAI's large language model in developing listening assessments
title_full_unstemmed Investigating the affordances of OpenAI's large language model in developing listening assessments
title_short Investigating the affordances of OpenAI's large language model in developing listening assessments
title_sort investigating the affordances of openai s large language model in developing listening assessments
topic Artificial intelligence (AI)
ChatGPT 4
Fine-tuning of prompts
Generative AI (GenAI)
Large language model
Listening assessment
url http://www.sciencedirect.com/science/article/pii/S2666920X24000055
work_keys_str_mv AT vahidaryadoust investigatingtheaffordancesofopenaislargelanguagemodelindevelopinglisteningassessments
AT azrifahzakaria investigatingtheaffordancesofopenaislargelanguagemodelindevelopinglisteningassessments
AT yichenjia investigatingtheaffordancesofopenaislargelanguagemodelindevelopinglisteningassessments