Quantifying variability in predictions of student performance: Examining the impact of bootstrap resampling in data pipelines

Educators seek to develop accurate and timely prediction models to forecast student retention and attrition. Although prior studies have generated single point estimates to quantify predictive efficacy, much less education research has examined variability in student performance predictions using no...

Full description

Bibliographic Details
Main Authors:	Roberto Bertolini, Stephen J. Finch, Ross H. Nehm
Format:	Article
Language:	English
Published:	Elsevier 2022-01-01
Series:	Computers and Education: Artificial Intelligence
Subjects:	Data science applications in education Evaluation methodologies Architectures for educational technology system Applications in subject areas Post-secondary education
Online Access:	http://www.sciencedirect.com/science/article/pii/S2666920X22000224

_version_	1811207549565796352
author	Roberto Bertolini Stephen J. Finch Ross H. Nehm
author_facet	Roberto Bertolini Stephen J. Finch Ross H. Nehm
author_sort	Roberto Bertolini
collection	DOAJ
description	Educators seek to develop accurate and timely prediction models to forecast student retention and attrition. Although prior studies have generated single point estimates to quantify predictive efficacy, much less education research has examined variability in student performance predictions using nonparametric bootstrap algorithms in data pipelines. In this study, bootstrapping was applied to examine performance variability among five data mining methods (DMMs) and four filter preprocessing feature selection techniques for forecasting course grades for 3225 students enrolled in an undergraduate biology class. While the median area under the curve (AUC) values obtained from bootstrapping were significantly lower than the AUC point estimates obtained without resampling, DMMs and feature selection techniques impacted variability in different ways. The ensemble technique elastic net regression (GLMNET) significantly outperformed all other DMMs and exhibited the least amount of variability in the AUC. However, all filter feature selection techniques significantly increased variability in student success predictions, compared to when this step was omitted from the data pipeline. We discuss the potential benefits and drawbacks of incorporating bootstrapping into prediction pipelines to track, monitor, and forecast classroom performance, as well as highlight the risks of only examining point estimates.
first_indexed	2024-04-12T04:04:55Z
format	Article
id	doaj.art-591d15107003480cb065c673a26c85cd
institution	Directory Open Access Journal
issn	2666-920X
language	English
last_indexed	2024-04-12T04:04:55Z
publishDate	2022-01-01
publisher	Elsevier
record_format	Article
series	Computers and Education: Artificial Intelligence
spelling	doaj.art-591d15107003480cb065c673a26c85cd2022-12-22T03:48:37ZengElsevierComputers and Education: Artificial Intelligence2666-920X2022-01-013100067Quantifying variability in predictions of student performance: Examining the impact of bootstrap resampling in data pipelinesRoberto Bertolini0Stephen J. Finch1Ross H. Nehm2Department of Applied Mathematics and Statistics, Stony Brook University (SUNY), Math Tower, Room P-139A, Stony Brook, NY, 11794-3600, USA; Corresponding author.Department of Applied Mathematics and Statistics, Stony Brook University (SUNY), Math Tower, Room P-139A, Stony Brook, NY, 11794-3600, USADepartment of Ecology and Evolution, Program in Science Education, Stony Brook University (SUNY), 650 Life Sciences Building, Stony Brook, NY, 11794-5245, USAEducators seek to develop accurate and timely prediction models to forecast student retention and attrition. Although prior studies have generated single point estimates to quantify predictive efficacy, much less education research has examined variability in student performance predictions using nonparametric bootstrap algorithms in data pipelines. In this study, bootstrapping was applied to examine performance variability among five data mining methods (DMMs) and four filter preprocessing feature selection techniques for forecasting course grades for 3225 students enrolled in an undergraduate biology class. While the median area under the curve (AUC) values obtained from bootstrapping were significantly lower than the AUC point estimates obtained without resampling, DMMs and feature selection techniques impacted variability in different ways. The ensemble technique elastic net regression (GLMNET) significantly outperformed all other DMMs and exhibited the least amount of variability in the AUC. However, all filter feature selection techniques significantly increased variability in student success predictions, compared to when this step was omitted from the data pipeline. We discuss the potential benefits and drawbacks of incorporating bootstrapping into prediction pipelines to track, monitor, and forecast classroom performance, as well as highlight the risks of only examining point estimates.http://www.sciencedirect.com/science/article/pii/S2666920X22000224Data science applications in educationEvaluation methodologiesArchitectures for educational technology systemApplications in subject areasPost-secondary education
spellingShingle	Roberto Bertolini Stephen J. Finch Ross H. Nehm Quantifying variability in predictions of student performance: Examining the impact of bootstrap resampling in data pipelines Computers and Education: Artificial Intelligence Data science applications in education Evaluation methodologies Architectures for educational technology system Applications in subject areas Post-secondary education
title	Quantifying variability in predictions of student performance: Examining the impact of bootstrap resampling in data pipelines
title_full	Quantifying variability in predictions of student performance: Examining the impact of bootstrap resampling in data pipelines
title_fullStr	Quantifying variability in predictions of student performance: Examining the impact of bootstrap resampling in data pipelines
title_full_unstemmed	Quantifying variability in predictions of student performance: Examining the impact of bootstrap resampling in data pipelines
title_short	Quantifying variability in predictions of student performance: Examining the impact of bootstrap resampling in data pipelines
title_sort	quantifying variability in predictions of student performance examining the impact of bootstrap resampling in data pipelines
topic	Data science applications in education Evaluation methodologies Architectures for educational technology system Applications in subject areas Post-secondary education
url	http://www.sciencedirect.com/science/article/pii/S2666920X22000224
work_keys_str_mv	AT robertobertolini quantifyingvariabilityinpredictionsofstudentperformanceexaminingtheimpactofbootstrapresamplingindatapipelines AT stephenjfinch quantifyingvariabilityinpredictionsofstudentperformanceexaminingtheimpactofbootstrapresamplingindatapipelines AT rosshnehm quantifyingvariabilityinpredictionsofstudentperformanceexaminingtheimpactofbootstrapresamplingindatapipelines

Quantifying variability in predictions of student performance: Examining the impact of bootstrap resampling in data pipelines

Similar Items