Investigating the affordances of OpenAI's large language model in developing listening assessments
To address the complexity and high costs of developing listening tests for test-takers of varying proficiency levels, this study investigates the capabilities of an OpenAI's large language model, ChatGPT 4, in developing listening assessments. Employing prompt engineering and fine-tuning of pro...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2024-06-01
|
Series: | Computers and Education: Artificial Intelligence |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2666920X24000055 |
_version_ | 1797350864777641984 |
---|---|
author | Vahid Aryadoust Azrifah Zakaria Yichen Jia |
author_facet | Vahid Aryadoust Azrifah Zakaria Yichen Jia |
author_sort | Vahid Aryadoust |
collection | DOAJ |
description | To address the complexity and high costs of developing listening tests for test-takers of varying proficiency levels, this study investigates the capabilities of an OpenAI's large language model, ChatGPT 4, in developing listening assessments. Employing prompt engineering and fine-tuning of prompts, the study specifically focuses on creating listening scripts and test items using ChatGPT 4 for test-takers across a spectrum of proficiency levels (academic, low, intermediate, and advanced). For comparability, the 24 topics of these scripts were selected from topics found in academic listening tests. We conducted two types of analyses to evaluate the quality of the output. First, we performed linguistic analyses of the scripts using Coh-Metrix and Text Inspector to determine if the scripts varied linguistically as required by the prompts. Second, we analyzed topic variation and the degree of overlap in the test items. Results indicated that while ChatGPT 4 reliably produced scripts with significant textual variations, the test items generated were often long and exhibited semantic overlaps among options. This effect was also influenced by the topic. We discuss the ethical complexities that arise from the use of generative artificial intelligence (AI), and how generative AI (GenAI) can potentially benefit practitioners and researchers in language assessment, while recognizing its limitations. |
first_indexed | 2024-03-08T12:51:17Z |
format | Article |
id | doaj.art-3cbb9e940ffd415392bd8e97b08c8f8e |
institution | Directory Open Access Journal |
issn | 2666-920X |
language | English |
last_indexed | 2024-03-08T12:51:17Z |
publishDate | 2024-06-01 |
publisher | Elsevier |
record_format | Article |
series | Computers and Education: Artificial Intelligence |
spelling | doaj.art-3cbb9e940ffd415392bd8e97b08c8f8e2024-01-20T04:46:47ZengElsevierComputers and Education: Artificial Intelligence2666-920X2024-06-016100204Investigating the affordances of OpenAI's large language model in developing listening assessmentsVahid Aryadoust0Azrifah Zakaria1Yichen Jia2Corresponding author.; National Institute of Education, Nanyang Technological University, SingaporeNational Institute of Education, Nanyang Technological University, SingaporeNational Institute of Education, Nanyang Technological University, SingaporeTo address the complexity and high costs of developing listening tests for test-takers of varying proficiency levels, this study investigates the capabilities of an OpenAI's large language model, ChatGPT 4, in developing listening assessments. Employing prompt engineering and fine-tuning of prompts, the study specifically focuses on creating listening scripts and test items using ChatGPT 4 for test-takers across a spectrum of proficiency levels (academic, low, intermediate, and advanced). For comparability, the 24 topics of these scripts were selected from topics found in academic listening tests. We conducted two types of analyses to evaluate the quality of the output. First, we performed linguistic analyses of the scripts using Coh-Metrix and Text Inspector to determine if the scripts varied linguistically as required by the prompts. Second, we analyzed topic variation and the degree of overlap in the test items. Results indicated that while ChatGPT 4 reliably produced scripts with significant textual variations, the test items generated were often long and exhibited semantic overlaps among options. This effect was also influenced by the topic. We discuss the ethical complexities that arise from the use of generative artificial intelligence (AI), and how generative AI (GenAI) can potentially benefit practitioners and researchers in language assessment, while recognizing its limitations.http://www.sciencedirect.com/science/article/pii/S2666920X24000055Artificial intelligence (AI)ChatGPT 4Fine-tuning of promptsGenerative AI (GenAI)Large language modelListening assessment |
spellingShingle | Vahid Aryadoust Azrifah Zakaria Yichen Jia Investigating the affordances of OpenAI's large language model in developing listening assessments Computers and Education: Artificial Intelligence Artificial intelligence (AI) ChatGPT 4 Fine-tuning of prompts Generative AI (GenAI) Large language model Listening assessment |
title | Investigating the affordances of OpenAI's large language model in developing listening assessments |
title_full | Investigating the affordances of OpenAI's large language model in developing listening assessments |
title_fullStr | Investigating the affordances of OpenAI's large language model in developing listening assessments |
title_full_unstemmed | Investigating the affordances of OpenAI's large language model in developing listening assessments |
title_short | Investigating the affordances of OpenAI's large language model in developing listening assessments |
title_sort | investigating the affordances of openai s large language model in developing listening assessments |
topic | Artificial intelligence (AI) ChatGPT 4 Fine-tuning of prompts Generative AI (GenAI) Large language model Listening assessment |
url | http://www.sciencedirect.com/science/article/pii/S2666920X24000055 |
work_keys_str_mv | AT vahidaryadoust investigatingtheaffordancesofopenaislargelanguagemodelindevelopinglisteningassessments AT azrifahzakaria investigatingtheaffordancesofopenaislargelanguagemodelindevelopinglisteningassessments AT yichenjia investigatingtheaffordancesofopenaislargelanguagemodelindevelopinglisteningassessments |