On Developing Generic Models for Predicting Student Outcomes in Educational Data Mining

Poor academic performance of students is a concern in the educational sector, especially if it leads to students being unable to meet minimum course requirements. However, with timely prediction of students’ performance, educators can detect at-risk students, thereby enabling early interventions for...

Full description

Bibliographic Details
Main Authors: Gomathy Ramaswami, Teo Susnjak, Anuradha Mathrani
Format: Article
Language:English
Published: MDPI AG 2022-01-01
Series:Big Data and Cognitive Computing
Subjects:
Online Access:https://www.mdpi.com/2504-2289/6/1/6
_version_ 1797447126711533568
author Gomathy Ramaswami
Teo Susnjak
Anuradha Mathrani
author_facet Gomathy Ramaswami
Teo Susnjak
Anuradha Mathrani
author_sort Gomathy Ramaswami
collection DOAJ
description Poor academic performance of students is a concern in the educational sector, especially if it leads to students being unable to meet minimum course requirements. However, with timely prediction of students’ performance, educators can detect at-risk students, thereby enabling early interventions for supporting these students in overcoming their learning difficulties. However, the majority of studies have taken the approach of developing individual models that target a single course while developing prediction models. These models are tailored to specific attributes of each course amongst a very diverse set of possibilities. While this approach can yield accurate models in some instances, this strategy is associated with limitations. In many cases, overfitting can take place when course data is small or when new courses are devised. Additionally, maintaining a large suite of models per course is a significant overhead. This issue can be tackled by developing a generic and course-agnostic predictive model that captures more abstract patterns and is able to operate across all courses, irrespective of their differences. This study demonstrates how a generic predictive model can be developed that identifies at-risk students across a wide variety of courses. Experiments were conducted using a range of algorithms, with the generic model producing an effective accuracy. The findings showed that the CatBoost algorithm performed the best on our dataset across the F-measure, ROC (receiver operating characteristic) curve and AUC scores; therefore, it is an excellent candidate algorithm for providing solutions on this domain given its capabilities to seamlessly handle categorical and missing data, which is frequently a feature in educational datasets.
first_indexed 2024-03-09T13:51:20Z
format Article
id doaj.art-a1245f6d664d4576b59228faac45fc07
institution Directory Open Access Journal
issn 2504-2289
language English
last_indexed 2024-03-09T13:51:20Z
publishDate 2022-01-01
publisher MDPI AG
record_format Article
series Big Data and Cognitive Computing
spelling doaj.art-a1245f6d664d4576b59228faac45fc072023-11-30T20:50:43ZengMDPI AGBig Data and Cognitive Computing2504-22892022-01-0161610.3390/bdcc6010006On Developing Generic Models for Predicting Student Outcomes in Educational Data MiningGomathy Ramaswami0Teo Susnjak1Anuradha Mathrani2School of Mathematical and Computational Sciences, Massey University, Auckland 0632, New ZealandSchool of Mathematical and Computational Sciences, Massey University, Auckland 0632, New ZealandSchool of Mathematical and Computational Sciences, Massey University, Auckland 0632, New ZealandPoor academic performance of students is a concern in the educational sector, especially if it leads to students being unable to meet minimum course requirements. However, with timely prediction of students’ performance, educators can detect at-risk students, thereby enabling early interventions for supporting these students in overcoming their learning difficulties. However, the majority of studies have taken the approach of developing individual models that target a single course while developing prediction models. These models are tailored to specific attributes of each course amongst a very diverse set of possibilities. While this approach can yield accurate models in some instances, this strategy is associated with limitations. In many cases, overfitting can take place when course data is small or when new courses are devised. Additionally, maintaining a large suite of models per course is a significant overhead. This issue can be tackled by developing a generic and course-agnostic predictive model that captures more abstract patterns and is able to operate across all courses, irrespective of their differences. This study demonstrates how a generic predictive model can be developed that identifies at-risk students across a wide variety of courses. Experiments were conducted using a range of algorithms, with the generic model producing an effective accuracy. The findings showed that the CatBoost algorithm performed the best on our dataset across the F-measure, ROC (receiver operating characteristic) curve and AUC scores; therefore, it is an excellent candidate algorithm for providing solutions on this domain given its capabilities to seamlessly handle categorical and missing data, which is frequently a feature in educational datasets.https://www.mdpi.com/2504-2289/6/1/6machine learningearly predictionCatBoostat-risk studentseducational data mining
spellingShingle Gomathy Ramaswami
Teo Susnjak
Anuradha Mathrani
On Developing Generic Models for Predicting Student Outcomes in Educational Data Mining
Big Data and Cognitive Computing
machine learning
early prediction
CatBoost
at-risk students
educational data mining
title On Developing Generic Models for Predicting Student Outcomes in Educational Data Mining
title_full On Developing Generic Models for Predicting Student Outcomes in Educational Data Mining
title_fullStr On Developing Generic Models for Predicting Student Outcomes in Educational Data Mining
title_full_unstemmed On Developing Generic Models for Predicting Student Outcomes in Educational Data Mining
title_short On Developing Generic Models for Predicting Student Outcomes in Educational Data Mining
title_sort on developing generic models for predicting student outcomes in educational data mining
topic machine learning
early prediction
CatBoost
at-risk students
educational data mining
url https://www.mdpi.com/2504-2289/6/1/6
work_keys_str_mv AT gomathyramaswami ondevelopinggenericmodelsforpredictingstudentoutcomesineducationaldatamining
AT teosusnjak ondevelopinggenericmodelsforpredictingstudentoutcomesineducationaldatamining
AT anuradhamathrani ondevelopinggenericmodelsforpredictingstudentoutcomesineducationaldatamining