Automated Assessment of Initial Answers to Questions in Conversational Intelligent Tutoring Systems: Are Contextual Embedding Models Really Better?

This paper assesses the ability of semantic text models to assess student responses to electronics questions compared with that of expert human judges. Recent interest in text similarity has led to a proliferation of models that can potentially be used for assessing student responses. However, it is...

Full description

Bibliographic Details
Main Authors:	Colin M. Carmon, Brent Morgan, Xiangen Hu, Arthur C. Graesser
Format:	Article
Language:	English
Published:	MDPI AG 2023-08-01
Series:	Electronics
Subjects:	computational linguistics conversational systems electronics training intelligent tutoring systems natural language processing naval training
Online Access:	https://www.mdpi.com/2079-9292/12/17/3654

_version_	1797582676721401856
author	Colin M. Carmon Brent Morgan Xiangen Hu Arthur C. Graesser
author_facet	Colin M. Carmon Brent Morgan Xiangen Hu Arthur C. Graesser
author_sort	Colin M. Carmon
collection	DOAJ
description	This paper assesses the ability of semantic text models to assess student responses to electronics questions compared with that of expert human judges. Recent interest in text similarity has led to a proliferation of models that can potentially be used for assessing student responses. However, it is unclear whether these models perform as well as early models of distributional semantics. We assessed 5166 response pairings of 219 participants across 118 electronics questions and scored each with 13 different computational text models, including models that use Regular Expressions, distributional semantics, embeddings, contextual embeddings, and combinations of these features. Regular Expressions performed the best out of the stand-alone models. We show other semantic text models performing comparably to the Latent Semantic Analysis model that was originally used for the current task, and in a small number of cases outperforming the model. Models trained on a domain-specific electronics corpus for the task performed better than models trained on general language or Newtonian physics. Furthermore, semantic text models combined with RegEx outperformed stand-alone models in agreement with human judges. Tuning the performance of these recent models in Automatic Short Answer Grading tasks for conversational intelligent tutoring systems requires empirical analysis, especially in domain-specific areas such as electronics. Therefore, the question arises as to how well recent contextual embedding models compare with earlier distributional semantic language models on this task of answering questions about electronics. These results shed light on the selection of appropriate computational techniques for text modeling to improve the accuracy, recall, weighted agreement, and ultimately the effectiveness of automatic scoring in conversational ITSs.
first_indexed	2024-03-10T23:24:49Z
format	Article
id	doaj.art-9091af20a9ea454ca0174b5b3505915f
institution	Directory Open Access Journal
issn	2079-9292
language	English
last_indexed	2024-03-10T23:24:49Z
publishDate	2023-08-01
publisher	MDPI AG
record_format	Article
series	Electronics
spelling	doaj.art-9091af20a9ea454ca0174b5b3505915f2023-11-19T08:02:11ZengMDPI AGElectronics2079-92922023-08-011217365410.3390/electronics12173654Automated Assessment of Initial Answers to Questions in Conversational Intelligent Tutoring Systems: Are Contextual Embedding Models Really Better?Colin M. Carmon0Brent Morgan1Xiangen Hu2Arthur C. Graesser3Institute for Intelligent Systems, University of Memphis, Memphis, TN 38152, USADepartment of Psychology, Rhodes College, Memphis, TN 38112, USAInstitute for Intelligent Systems, University of Memphis, Memphis, TN 38152, USAInstitute for Intelligent Systems, University of Memphis, Memphis, TN 38152, USAThis paper assesses the ability of semantic text models to assess student responses to electronics questions compared with that of expert human judges. Recent interest in text similarity has led to a proliferation of models that can potentially be used for assessing student responses. However, it is unclear whether these models perform as well as early models of distributional semantics. We assessed 5166 response pairings of 219 participants across 118 electronics questions and scored each with 13 different computational text models, including models that use Regular Expressions, distributional semantics, embeddings, contextual embeddings, and combinations of these features. Regular Expressions performed the best out of the stand-alone models. We show other semantic text models performing comparably to the Latent Semantic Analysis model that was originally used for the current task, and in a small number of cases outperforming the model. Models trained on a domain-specific electronics corpus for the task performed better than models trained on general language or Newtonian physics. Furthermore, semantic text models combined with RegEx outperformed stand-alone models in agreement with human judges. Tuning the performance of these recent models in Automatic Short Answer Grading tasks for conversational intelligent tutoring systems requires empirical analysis, especially in domain-specific areas such as electronics. Therefore, the question arises as to how well recent contextual embedding models compare with earlier distributional semantic language models on this task of answering questions about electronics. These results shed light on the selection of appropriate computational techniques for text modeling to improve the accuracy, recall, weighted agreement, and ultimately the effectiveness of automatic scoring in conversational ITSs.https://www.mdpi.com/2079-9292/12/17/3654computational linguisticsconversational systemselectronics trainingintelligent tutoring systemsnatural language processingnaval training
spellingShingle	Colin M. Carmon Brent Morgan Xiangen Hu Arthur C. Graesser Automated Assessment of Initial Answers to Questions in Conversational Intelligent Tutoring Systems: Are Contextual Embedding Models Really Better? Electronics computational linguistics conversational systems electronics training intelligent tutoring systems natural language processing naval training
title	Automated Assessment of Initial Answers to Questions in Conversational Intelligent Tutoring Systems: Are Contextual Embedding Models Really Better?
title_full	Automated Assessment of Initial Answers to Questions in Conversational Intelligent Tutoring Systems: Are Contextual Embedding Models Really Better?
title_fullStr	Automated Assessment of Initial Answers to Questions in Conversational Intelligent Tutoring Systems: Are Contextual Embedding Models Really Better?
title_full_unstemmed	Automated Assessment of Initial Answers to Questions in Conversational Intelligent Tutoring Systems: Are Contextual Embedding Models Really Better?
title_short	Automated Assessment of Initial Answers to Questions in Conversational Intelligent Tutoring Systems: Are Contextual Embedding Models Really Better?
title_sort	automated assessment of initial answers to questions in conversational intelligent tutoring systems are contextual embedding models really better
topic	computational linguistics conversational systems electronics training intelligent tutoring systems natural language processing naval training
url	https://www.mdpi.com/2079-9292/12/17/3654
work_keys_str_mv	AT colinmcarmon automatedassessmentofinitialanswerstoquestionsinconversationalintelligenttutoringsystemsarecontextualembeddingmodelsreallybetter AT brentmorgan automatedassessmentofinitialanswerstoquestionsinconversationalintelligenttutoringsystemsarecontextualembeddingmodelsreallybetter AT xiangenhu automatedassessmentofinitialanswerstoquestionsinconversationalintelligenttutoringsystemsarecontextualembeddingmodelsreallybetter AT arthurcgraesser automatedassessmentofinitialanswerstoquestionsinconversationalintelligenttutoringsystemsarecontextualembeddingmodelsreallybetter

Automated Assessment of Initial Answers to Questions in Conversational Intelligent Tutoring Systems: Are Contextual Embedding Models Really Better?

Similar Items