Augmenting assessment with AI coding of online student discourse: A question of reliability

Currently, many generative Artificial Intelligence (AI) tools are being integrated into the educational technology landscape for instructors. Our paper examines the potential and challenges of using Large Language Models (LLMs) to code student-generated content in online discussions based on intende...

Full description

Bibliographic Details
Main Authors:	Kamila Misiejuk, Rogers Kaliisa, Jennifer Scianna
Format:	Article
Language:	English
Published:	Elsevier 2024-06-01
Series:	Computers and Education: Artificial Intelligence
Subjects:	Artificial intelligence Data coding ChatGPT Large language models Learning analytics AI-driven assessment
Online Access:	http://www.sciencedirect.com/science/article/pii/S2666920X24000171

_version_	1797235450690142208
author	Kamila Misiejuk Rogers Kaliisa Jennifer Scianna
author_facet	Kamila Misiejuk Rogers Kaliisa Jennifer Scianna
author_sort	Kamila Misiejuk
collection	DOAJ
description	Currently, many generative Artificial Intelligence (AI) tools are being integrated into the educational technology landscape for instructors. Our paper examines the potential and challenges of using Large Language Models (LLMs) to code student-generated content in online discussions based on intended learning outcomes and how instructors could use this to assess the intended and enacted learning design. If instructors were to rely on LLMs as a means of assessment, the reliability of these models to code the data accurately is crucial. Employing a diverse set of LLMs from the GPT family and prompting techniques on an asynchronous online discussion dataset from a blended-learning bachelor-level course, our research examines the reliability of AI-supported coding in educational research. Findings reveal that while AI-supported coding demonstrates efficiency, achieving substantial, moderate agreement with human coding for specific nuanced and context-dependent codes is challenging. Moreover, the high cost, token limits, and the advanced necessary skills needed to write API scripts might limit the usability of AI-driven coding. Finally, implementation would require specific parameterization techniques based on the class and may not be feasible for widespread implementation. Our study underscores the importance of transparency in AI coding methodologies and the need for a hybrid approach that integrates human judgement to ensure data accuracy and interpretability. In addition, it contributes to the knowledge base about the reliability of LLMs to code real, small datasets using complex codes that are common in the instructor's practice and explores the potential and challenges of using these models for assessment purposes.
first_indexed	2024-04-24T16:48:09Z
format	Article
id	doaj.art-267b63c90f69426a9e24547b0109e20a
institution	Directory Open Access Journal
issn	2666-920X
language	English
last_indexed	2024-04-24T16:48:09Z
publishDate	2024-06-01
publisher	Elsevier
record_format	Article
series	Computers and Education: Artificial Intelligence
spelling	doaj.art-267b63c90f69426a9e24547b0109e20a2024-03-29T05:51:18ZengElsevierComputers and Education: Artificial Intelligence2666-920X2024-06-016100216Augmenting assessment with AI coding of online student discourse: A question of reliabilityKamila Misiejuk0Rogers Kaliisa1Jennifer Scianna2Centre for the Science of Learning and Technology (SLATE), University of Bergen, Bergen, Norway; Corresponding author.Department of Education, University of Oslo, Oslo, NorwayDepartment of Curriculum and Instruction, University of Wisconsin-Madison, WI, USACurrently, many generative Artificial Intelligence (AI) tools are being integrated into the educational technology landscape for instructors. Our paper examines the potential and challenges of using Large Language Models (LLMs) to code student-generated content in online discussions based on intended learning outcomes and how instructors could use this to assess the intended and enacted learning design. If instructors were to rely on LLMs as a means of assessment, the reliability of these models to code the data accurately is crucial. Employing a diverse set of LLMs from the GPT family and prompting techniques on an asynchronous online discussion dataset from a blended-learning bachelor-level course, our research examines the reliability of AI-supported coding in educational research. Findings reveal that while AI-supported coding demonstrates efficiency, achieving substantial, moderate agreement with human coding for specific nuanced and context-dependent codes is challenging. Moreover, the high cost, token limits, and the advanced necessary skills needed to write API scripts might limit the usability of AI-driven coding. Finally, implementation would require specific parameterization techniques based on the class and may not be feasible for widespread implementation. Our study underscores the importance of transparency in AI coding methodologies and the need for a hybrid approach that integrates human judgement to ensure data accuracy and interpretability. In addition, it contributes to the knowledge base about the reliability of LLMs to code real, small datasets using complex codes that are common in the instructor's practice and explores the potential and challenges of using these models for assessment purposes.http://www.sciencedirect.com/science/article/pii/S2666920X24000171Artificial intelligenceData codingChatGPTLarge language modelsLearning analyticsAI-driven assessment
spellingShingle	Kamila Misiejuk Rogers Kaliisa Jennifer Scianna Augmenting assessment with AI coding of online student discourse: A question of reliability Computers and Education: Artificial Intelligence Artificial intelligence Data coding ChatGPT Large language models Learning analytics AI-driven assessment
title	Augmenting assessment with AI coding of online student discourse: A question of reliability
title_full	Augmenting assessment with AI coding of online student discourse: A question of reliability
title_fullStr	Augmenting assessment with AI coding of online student discourse: A question of reliability
title_full_unstemmed	Augmenting assessment with AI coding of online student discourse: A question of reliability
title_short	Augmenting assessment with AI coding of online student discourse: A question of reliability
title_sort	augmenting assessment with ai coding of online student discourse a question of reliability
topic	Artificial intelligence Data coding ChatGPT Large language models Learning analytics AI-driven assessment
url	http://www.sciencedirect.com/science/article/pii/S2666920X24000171
work_keys_str_mv	AT kamilamisiejuk augmentingassessmentwithaicodingofonlinestudentdiscourseaquestionofreliability AT rogerskaliisa augmentingassessmentwithaicodingofonlinestudentdiscourseaquestionofreliability AT jenniferscianna augmentingassessmentwithaicodingofonlinestudentdiscourseaquestionofreliability

Augmenting assessment with AI coding of online student discourse: A question of reliability

Similar Items