Explainable Automated Essay Scoring: Deep Learning Really Has Pedagogical Value

Automated essay scoring (AES) is a compelling topic in Learning Analytics for the primary reason that recent advances in AI find it as a good testbed to explore artificial supplementation of human creativity. However, a vast swath of research tackles AES only holistically; few have even developed AE...

Full description

Bibliographic Details
Main Authors:	Vivekanandan Kumar, David Boulanger
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2020-10-01
Series:	Frontiers in Education
Subjects:	explainable artificial intelligence SHAP automated essay scoring deep learning trust learning analytics
Online Access:	https://www.frontiersin.org/article/10.3389/feduc.2020.572367/full

_version_	1818539404050825216
author	Vivekanandan Kumar David Boulanger
author_facet	Vivekanandan Kumar David Boulanger
author_sort	Vivekanandan Kumar
collection	DOAJ
description	Automated essay scoring (AES) is a compelling topic in Learning Analytics for the primary reason that recent advances in AI find it as a good testbed to explore artificial supplementation of human creativity. However, a vast swath of research tackles AES only holistically; few have even developed AES models at the rubric level, the very first layer of explanation underlying the prediction of holistic scores. Consequently, the AES black box has remained impenetrable. Although several algorithms from Explainable Artificial Intelligence have recently been published, no research has yet investigated the role that these explanation models can play in: (a) discovering the decision-making process that drives AES, (b) fine-tuning predictive models to improve generalizability and interpretability, and (c) providing personalized, formative, and fine-grained feedback to students during the writing process. Building on previous studies where models were trained to predict both the holistic and rubric scores of essays, using the Automated Student Assessment Prize’s essay datasets, this study focuses on predicting the quality of the writing style of Grade-7 essays and exposes the decision processes that lead to these predictions. In doing so, it evaluates the impact of deep learning (multi-layer perceptron neural networks) on the performance of AES. It has been found that the effect of deep learning can be best viewed when assessing the trustworthiness of explanation models. As more hidden layers were added to the neural network, the descriptive accuracy increased by about 10%. This study shows that faster (up to three orders of magnitude) SHAP implementations are as accurate as the slower model-agnostic one. It leverages the state-of-the-art in natural language processing, applying feature selection on a pool of 1592 linguistic indices that measure aspects of text cohesion, lexical diversity, lexical sophistication, and syntactic sophistication and complexity. In addition to the list of most globally important features, this study reports (a) a list of features that are important for a specific essay (locally), (b) a range of values for each feature that contribute to higher or lower rubric scores, and (c) a model that allows to quantify the impact of the implementation of formative feedback.
first_indexed	2024-12-11T21:41:35Z
format	Article
id	doaj.art-1270628a5be84971bd6d268354c73918
institution	Directory Open Access Journal
issn	2504-284X
language	English
last_indexed	2024-12-11T21:41:35Z
publishDate	2020-10-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Education
spelling	doaj.art-1270628a5be84971bd6d268354c739182022-12-22T00:49:48ZengFrontiers Media S.A.Frontiers in Education2504-284X2020-10-01510.3389/feduc.2020.572367572367Explainable Automated Essay Scoring: Deep Learning Really Has Pedagogical ValueVivekanandan KumarDavid BoulangerAutomated essay scoring (AES) is a compelling topic in Learning Analytics for the primary reason that recent advances in AI find it as a good testbed to explore artificial supplementation of human creativity. However, a vast swath of research tackles AES only holistically; few have even developed AES models at the rubric level, the very first layer of explanation underlying the prediction of holistic scores. Consequently, the AES black box has remained impenetrable. Although several algorithms from Explainable Artificial Intelligence have recently been published, no research has yet investigated the role that these explanation models can play in: (a) discovering the decision-making process that drives AES, (b) fine-tuning predictive models to improve generalizability and interpretability, and (c) providing personalized, formative, and fine-grained feedback to students during the writing process. Building on previous studies where models were trained to predict both the holistic and rubric scores of essays, using the Automated Student Assessment Prize’s essay datasets, this study focuses on predicting the quality of the writing style of Grade-7 essays and exposes the decision processes that lead to these predictions. In doing so, it evaluates the impact of deep learning (multi-layer perceptron neural networks) on the performance of AES. It has been found that the effect of deep learning can be best viewed when assessing the trustworthiness of explanation models. As more hidden layers were added to the neural network, the descriptive accuracy increased by about 10%. This study shows that faster (up to three orders of magnitude) SHAP implementations are as accurate as the slower model-agnostic one. It leverages the state-of-the-art in natural language processing, applying feature selection on a pool of 1592 linguistic indices that measure aspects of text cohesion, lexical diversity, lexical sophistication, and syntactic sophistication and complexity. In addition to the list of most globally important features, this study reports (a) a list of features that are important for a specific essay (locally), (b) a range of values for each feature that contribute to higher or lower rubric scores, and (c) a model that allows to quantify the impact of the implementation of formative feedback.https://www.frontiersin.org/article/10.3389/feduc.2020.572367/fullexplainable artificial intelligenceSHAPautomated essay scoringdeep learningtrustlearning analytics
spellingShingle	Vivekanandan Kumar David Boulanger Explainable Automated Essay Scoring: Deep Learning Really Has Pedagogical Value Frontiers in Education explainable artificial intelligence SHAP automated essay scoring deep learning trust learning analytics
title	Explainable Automated Essay Scoring: Deep Learning Really Has Pedagogical Value
title_full	Explainable Automated Essay Scoring: Deep Learning Really Has Pedagogical Value
title_fullStr	Explainable Automated Essay Scoring: Deep Learning Really Has Pedagogical Value
title_full_unstemmed	Explainable Automated Essay Scoring: Deep Learning Really Has Pedagogical Value
title_short	Explainable Automated Essay Scoring: Deep Learning Really Has Pedagogical Value
title_sort	explainable automated essay scoring deep learning really has pedagogical value
topic	explainable artificial intelligence SHAP automated essay scoring deep learning trust learning analytics
url	https://www.frontiersin.org/article/10.3389/feduc.2020.572367/full
work_keys_str_mv	AT vivekanandankumar explainableautomatedessayscoringdeeplearningreallyhaspedagogicalvalue AT davidboulanger explainableautomatedessayscoringdeeplearningreallyhaspedagogicalvalue

Explainable Automated Essay Scoring: Deep Learning Really Has Pedagogical Value

Similar Items