Automated Assessment of Initial Answers to Questions in Conversational Intelligent Tutoring Systems: Are Contextual Embedding Models Really Better?
This paper assesses the ability of semantic text models to assess student responses to electronics questions compared with that of expert human judges. Recent interest in text similarity has led to a proliferation of models that can potentially be used for assessing student responses. However, it is...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-08-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/12/17/3654 |
_version_ | 1797582676721401856 |
---|---|
author | Colin M. Carmon Brent Morgan Xiangen Hu Arthur C. Graesser |
author_facet | Colin M. Carmon Brent Morgan Xiangen Hu Arthur C. Graesser |
author_sort | Colin M. Carmon |
collection | DOAJ |
description | This paper assesses the ability of semantic text models to assess student responses to electronics questions compared with that of expert human judges. Recent interest in text similarity has led to a proliferation of models that can potentially be used for assessing student responses. However, it is unclear whether these models perform as well as early models of distributional semantics. We assessed 5166 response pairings of 219 participants across 118 electronics questions and scored each with 13 different computational text models, including models that use Regular Expressions, distributional semantics, embeddings, contextual embeddings, and combinations of these features. Regular Expressions performed the best out of the stand-alone models. We show other semantic text models performing comparably to the Latent Semantic Analysis model that was originally used for the current task, and in a small number of cases outperforming the model. Models trained on a domain-specific electronics corpus for the task performed better than models trained on general language or Newtonian physics. Furthermore, semantic text models combined with RegEx outperformed stand-alone models in agreement with human judges. Tuning the performance of these recent models in Automatic Short Answer Grading tasks for conversational intelligent tutoring systems requires empirical analysis, especially in domain-specific areas such as electronics. Therefore, the question arises as to how well recent contextual embedding models compare with earlier distributional semantic language models on this task of answering questions about electronics. These results shed light on the selection of appropriate computational techniques for text modeling to improve the accuracy, recall, weighted agreement, and ultimately the effectiveness of automatic scoring in conversational ITSs. |
first_indexed | 2024-03-10T23:24:49Z |
format | Article |
id | doaj.art-9091af20a9ea454ca0174b5b3505915f |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-03-10T23:24:49Z |
publishDate | 2023-08-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-9091af20a9ea454ca0174b5b3505915f2023-11-19T08:02:11ZengMDPI AGElectronics2079-92922023-08-011217365410.3390/electronics12173654Automated Assessment of Initial Answers to Questions in Conversational Intelligent Tutoring Systems: Are Contextual Embedding Models Really Better?Colin M. Carmon0Brent Morgan1Xiangen Hu2Arthur C. Graesser3Institute for Intelligent Systems, University of Memphis, Memphis, TN 38152, USADepartment of Psychology, Rhodes College, Memphis, TN 38112, USAInstitute for Intelligent Systems, University of Memphis, Memphis, TN 38152, USAInstitute for Intelligent Systems, University of Memphis, Memphis, TN 38152, USAThis paper assesses the ability of semantic text models to assess student responses to electronics questions compared with that of expert human judges. Recent interest in text similarity has led to a proliferation of models that can potentially be used for assessing student responses. However, it is unclear whether these models perform as well as early models of distributional semantics. We assessed 5166 response pairings of 219 participants across 118 electronics questions and scored each with 13 different computational text models, including models that use Regular Expressions, distributional semantics, embeddings, contextual embeddings, and combinations of these features. Regular Expressions performed the best out of the stand-alone models. We show other semantic text models performing comparably to the Latent Semantic Analysis model that was originally used for the current task, and in a small number of cases outperforming the model. Models trained on a domain-specific electronics corpus for the task performed better than models trained on general language or Newtonian physics. Furthermore, semantic text models combined with RegEx outperformed stand-alone models in agreement with human judges. Tuning the performance of these recent models in Automatic Short Answer Grading tasks for conversational intelligent tutoring systems requires empirical analysis, especially in domain-specific areas such as electronics. Therefore, the question arises as to how well recent contextual embedding models compare with earlier distributional semantic language models on this task of answering questions about electronics. These results shed light on the selection of appropriate computational techniques for text modeling to improve the accuracy, recall, weighted agreement, and ultimately the effectiveness of automatic scoring in conversational ITSs.https://www.mdpi.com/2079-9292/12/17/3654computational linguisticsconversational systemselectronics trainingintelligent tutoring systemsnatural language processingnaval training |
spellingShingle | Colin M. Carmon Brent Morgan Xiangen Hu Arthur C. Graesser Automated Assessment of Initial Answers to Questions in Conversational Intelligent Tutoring Systems: Are Contextual Embedding Models Really Better? Electronics computational linguistics conversational systems electronics training intelligent tutoring systems natural language processing naval training |
title | Automated Assessment of Initial Answers to Questions in Conversational Intelligent Tutoring Systems: Are Contextual Embedding Models Really Better? |
title_full | Automated Assessment of Initial Answers to Questions in Conversational Intelligent Tutoring Systems: Are Contextual Embedding Models Really Better? |
title_fullStr | Automated Assessment of Initial Answers to Questions in Conversational Intelligent Tutoring Systems: Are Contextual Embedding Models Really Better? |
title_full_unstemmed | Automated Assessment of Initial Answers to Questions in Conversational Intelligent Tutoring Systems: Are Contextual Embedding Models Really Better? |
title_short | Automated Assessment of Initial Answers to Questions in Conversational Intelligent Tutoring Systems: Are Contextual Embedding Models Really Better? |
title_sort | automated assessment of initial answers to questions in conversational intelligent tutoring systems are contextual embedding models really better |
topic | computational linguistics conversational systems electronics training intelligent tutoring systems natural language processing naval training |
url | https://www.mdpi.com/2079-9292/12/17/3654 |
work_keys_str_mv | AT colinmcarmon automatedassessmentofinitialanswerstoquestionsinconversationalintelligenttutoringsystemsarecontextualembeddingmodelsreallybetter AT brentmorgan automatedassessmentofinitialanswerstoquestionsinconversationalintelligenttutoringsystemsarecontextualembeddingmodelsreallybetter AT xiangenhu automatedassessmentofinitialanswerstoquestionsinconversationalintelligenttutoringsystemsarecontextualembeddingmodelsreallybetter AT arthurcgraesser automatedassessmentofinitialanswerstoquestionsinconversationalintelligenttutoringsystemsarecontextualembeddingmodelsreallybetter |