Explainable Automated Essay Scoring: Deep Learning Really Has Pedagogical Value

Automated essay scoring (AES) is a compelling topic in Learning Analytics for the primary reason that recent advances in AI find it as a good testbed to explore artificial supplementation of human creativity. However, a vast swath of research tackles AES only holistically; few have even developed AE...

Full description

Bibliographic Details
Main Authors: Vivekanandan Kumar, David Boulanger
Format: Article
Language:English
Published: Frontiers Media S.A. 2020-10-01
Series:Frontiers in Education
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/feduc.2020.572367/full
_version_ 1818539404050825216
author Vivekanandan Kumar
David Boulanger
author_facet Vivekanandan Kumar
David Boulanger
author_sort Vivekanandan Kumar
collection DOAJ
description Automated essay scoring (AES) is a compelling topic in Learning Analytics for the primary reason that recent advances in AI find it as a good testbed to explore artificial supplementation of human creativity. However, a vast swath of research tackles AES only holistically; few have even developed AES models at the rubric level, the very first layer of explanation underlying the prediction of holistic scores. Consequently, the AES black box has remained impenetrable. Although several algorithms from Explainable Artificial Intelligence have recently been published, no research has yet investigated the role that these explanation models can play in: (a) discovering the decision-making process that drives AES, (b) fine-tuning predictive models to improve generalizability and interpretability, and (c) providing personalized, formative, and fine-grained feedback to students during the writing process. Building on previous studies where models were trained to predict both the holistic and rubric scores of essays, using the Automated Student Assessment Prize’s essay datasets, this study focuses on predicting the quality of the writing style of Grade-7 essays and exposes the decision processes that lead to these predictions. In doing so, it evaluates the impact of deep learning (multi-layer perceptron neural networks) on the performance of AES. It has been found that the effect of deep learning can be best viewed when assessing the trustworthiness of explanation models. As more hidden layers were added to the neural network, the descriptive accuracy increased by about 10%. This study shows that faster (up to three orders of magnitude) SHAP implementations are as accurate as the slower model-agnostic one. It leverages the state-of-the-art in natural language processing, applying feature selection on a pool of 1592 linguistic indices that measure aspects of text cohesion, lexical diversity, lexical sophistication, and syntactic sophistication and complexity. In addition to the list of most globally important features, this study reports (a) a list of features that are important for a specific essay (locally), (b) a range of values for each feature that contribute to higher or lower rubric scores, and (c) a model that allows to quantify the impact of the implementation of formative feedback.
first_indexed 2024-12-11T21:41:35Z
format Article
id doaj.art-1270628a5be84971bd6d268354c73918
institution Directory Open Access Journal
issn 2504-284X
language English
last_indexed 2024-12-11T21:41:35Z
publishDate 2020-10-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Education
spelling doaj.art-1270628a5be84971bd6d268354c739182022-12-22T00:49:48ZengFrontiers Media S.A.Frontiers in Education2504-284X2020-10-01510.3389/feduc.2020.572367572367Explainable Automated Essay Scoring: Deep Learning Really Has Pedagogical ValueVivekanandan KumarDavid BoulangerAutomated essay scoring (AES) is a compelling topic in Learning Analytics for the primary reason that recent advances in AI find it as a good testbed to explore artificial supplementation of human creativity. However, a vast swath of research tackles AES only holistically; few have even developed AES models at the rubric level, the very first layer of explanation underlying the prediction of holistic scores. Consequently, the AES black box has remained impenetrable. Although several algorithms from Explainable Artificial Intelligence have recently been published, no research has yet investigated the role that these explanation models can play in: (a) discovering the decision-making process that drives AES, (b) fine-tuning predictive models to improve generalizability and interpretability, and (c) providing personalized, formative, and fine-grained feedback to students during the writing process. Building on previous studies where models were trained to predict both the holistic and rubric scores of essays, using the Automated Student Assessment Prize’s essay datasets, this study focuses on predicting the quality of the writing style of Grade-7 essays and exposes the decision processes that lead to these predictions. In doing so, it evaluates the impact of deep learning (multi-layer perceptron neural networks) on the performance of AES. It has been found that the effect of deep learning can be best viewed when assessing the trustworthiness of explanation models. As more hidden layers were added to the neural network, the descriptive accuracy increased by about 10%. This study shows that faster (up to three orders of magnitude) SHAP implementations are as accurate as the slower model-agnostic one. It leverages the state-of-the-art in natural language processing, applying feature selection on a pool of 1592 linguistic indices that measure aspects of text cohesion, lexical diversity, lexical sophistication, and syntactic sophistication and complexity. In addition to the list of most globally important features, this study reports (a) a list of features that are important for a specific essay (locally), (b) a range of values for each feature that contribute to higher or lower rubric scores, and (c) a model that allows to quantify the impact of the implementation of formative feedback.https://www.frontiersin.org/article/10.3389/feduc.2020.572367/fullexplainable artificial intelligenceSHAPautomated essay scoringdeep learningtrustlearning analytics
spellingShingle Vivekanandan Kumar
David Boulanger
Explainable Automated Essay Scoring: Deep Learning Really Has Pedagogical Value
Frontiers in Education
explainable artificial intelligence
SHAP
automated essay scoring
deep learning
trust
learning analytics
title Explainable Automated Essay Scoring: Deep Learning Really Has Pedagogical Value
title_full Explainable Automated Essay Scoring: Deep Learning Really Has Pedagogical Value
title_fullStr Explainable Automated Essay Scoring: Deep Learning Really Has Pedagogical Value
title_full_unstemmed Explainable Automated Essay Scoring: Deep Learning Really Has Pedagogical Value
title_short Explainable Automated Essay Scoring: Deep Learning Really Has Pedagogical Value
title_sort explainable automated essay scoring deep learning really has pedagogical value
topic explainable artificial intelligence
SHAP
automated essay scoring
deep learning
trust
learning analytics
url https://www.frontiersin.org/article/10.3389/feduc.2020.572367/full
work_keys_str_mv AT vivekanandankumar explainableautomatedessayscoringdeeplearningreallyhaspedagogicalvalue
AT davidboulanger explainableautomatedessayscoringdeeplearningreallyhaspedagogicalvalue