Automated Assessment of Initial Answers to Questions in Conversational Intelligent Tutoring Systems: Are Contextual Embedding Models Really Better?

This paper assesses the ability of semantic text models to assess student responses to electronics questions compared with that of expert human judges. Recent interest in text similarity has led to a proliferation of models that can potentially be used for assessing student responses. However, it is...

Full description

Bibliographic Details
Main Authors: Colin M. Carmon, Brent Morgan, Xiangen Hu, Arthur C. Graesser
Format: Article
Language:English
Published: MDPI AG 2023-08-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/12/17/3654
_version_ 1797582676721401856
author Colin M. Carmon
Brent Morgan
Xiangen Hu
Arthur C. Graesser
author_facet Colin M. Carmon
Brent Morgan
Xiangen Hu
Arthur C. Graesser
author_sort Colin M. Carmon
collection DOAJ
description This paper assesses the ability of semantic text models to assess student responses to electronics questions compared with that of expert human judges. Recent interest in text similarity has led to a proliferation of models that can potentially be used for assessing student responses. However, it is unclear whether these models perform as well as early models of distributional semantics. We assessed 5166 response pairings of 219 participants across 118 electronics questions and scored each with 13 different computational text models, including models that use Regular Expressions, distributional semantics, embeddings, contextual embeddings, and combinations of these features. Regular Expressions performed the best out of the stand-alone models. We show other semantic text models performing comparably to the Latent Semantic Analysis model that was originally used for the current task, and in a small number of cases outperforming the model. Models trained on a domain-specific electronics corpus for the task performed better than models trained on general language or Newtonian physics. Furthermore, semantic text models combined with RegEx outperformed stand-alone models in agreement with human judges. Tuning the performance of these recent models in Automatic Short Answer Grading tasks for conversational intelligent tutoring systems requires empirical analysis, especially in domain-specific areas such as electronics. Therefore, the question arises as to how well recent contextual embedding models compare with earlier distributional semantic language models on this task of answering questions about electronics. These results shed light on the selection of appropriate computational techniques for text modeling to improve the accuracy, recall, weighted agreement, and ultimately the effectiveness of automatic scoring in conversational ITSs.
first_indexed 2024-03-10T23:24:49Z
format Article
id doaj.art-9091af20a9ea454ca0174b5b3505915f
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-10T23:24:49Z
publishDate 2023-08-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-9091af20a9ea454ca0174b5b3505915f2023-11-19T08:02:11ZengMDPI AGElectronics2079-92922023-08-011217365410.3390/electronics12173654Automated Assessment of Initial Answers to Questions in Conversational Intelligent Tutoring Systems: Are Contextual Embedding Models Really Better?Colin M. Carmon0Brent Morgan1Xiangen Hu2Arthur C. Graesser3Institute for Intelligent Systems, University of Memphis, Memphis, TN 38152, USADepartment of Psychology, Rhodes College, Memphis, TN 38112, USAInstitute for Intelligent Systems, University of Memphis, Memphis, TN 38152, USAInstitute for Intelligent Systems, University of Memphis, Memphis, TN 38152, USAThis paper assesses the ability of semantic text models to assess student responses to electronics questions compared with that of expert human judges. Recent interest in text similarity has led to a proliferation of models that can potentially be used for assessing student responses. However, it is unclear whether these models perform as well as early models of distributional semantics. We assessed 5166 response pairings of 219 participants across 118 electronics questions and scored each with 13 different computational text models, including models that use Regular Expressions, distributional semantics, embeddings, contextual embeddings, and combinations of these features. Regular Expressions performed the best out of the stand-alone models. We show other semantic text models performing comparably to the Latent Semantic Analysis model that was originally used for the current task, and in a small number of cases outperforming the model. Models trained on a domain-specific electronics corpus for the task performed better than models trained on general language or Newtonian physics. Furthermore, semantic text models combined with RegEx outperformed stand-alone models in agreement with human judges. Tuning the performance of these recent models in Automatic Short Answer Grading tasks for conversational intelligent tutoring systems requires empirical analysis, especially in domain-specific areas such as electronics. Therefore, the question arises as to how well recent contextual embedding models compare with earlier distributional semantic language models on this task of answering questions about electronics. These results shed light on the selection of appropriate computational techniques for text modeling to improve the accuracy, recall, weighted agreement, and ultimately the effectiveness of automatic scoring in conversational ITSs.https://www.mdpi.com/2079-9292/12/17/3654computational linguisticsconversational systemselectronics trainingintelligent tutoring systemsnatural language processingnaval training
spellingShingle Colin M. Carmon
Brent Morgan
Xiangen Hu
Arthur C. Graesser
Automated Assessment of Initial Answers to Questions in Conversational Intelligent Tutoring Systems: Are Contextual Embedding Models Really Better?
Electronics
computational linguistics
conversational systems
electronics training
intelligent tutoring systems
natural language processing
naval training
title Automated Assessment of Initial Answers to Questions in Conversational Intelligent Tutoring Systems: Are Contextual Embedding Models Really Better?
title_full Automated Assessment of Initial Answers to Questions in Conversational Intelligent Tutoring Systems: Are Contextual Embedding Models Really Better?
title_fullStr Automated Assessment of Initial Answers to Questions in Conversational Intelligent Tutoring Systems: Are Contextual Embedding Models Really Better?
title_full_unstemmed Automated Assessment of Initial Answers to Questions in Conversational Intelligent Tutoring Systems: Are Contextual Embedding Models Really Better?
title_short Automated Assessment of Initial Answers to Questions in Conversational Intelligent Tutoring Systems: Are Contextual Embedding Models Really Better?
title_sort automated assessment of initial answers to questions in conversational intelligent tutoring systems are contextual embedding models really better
topic computational linguistics
conversational systems
electronics training
intelligent tutoring systems
natural language processing
naval training
url https://www.mdpi.com/2079-9292/12/17/3654
work_keys_str_mv AT colinmcarmon automatedassessmentofinitialanswerstoquestionsinconversationalintelligenttutoringsystemsarecontextualembeddingmodelsreallybetter
AT brentmorgan automatedassessmentofinitialanswerstoquestionsinconversationalintelligenttutoringsystemsarecontextualembeddingmodelsreallybetter
AT xiangenhu automatedassessmentofinitialanswerstoquestionsinconversationalintelligenttutoringsystemsarecontextualembeddingmodelsreallybetter
AT arthurcgraesser automatedassessmentofinitialanswerstoquestionsinconversationalintelligenttutoringsystemsarecontextualembeddingmodelsreallybetter