On Developing Generic Models for Predicting Student Outcomes in Educational Data Mining
Poor academic performance of students is a concern in the educational sector, especially if it leads to students being unable to meet minimum course requirements. However, with timely prediction of students’ performance, educators can detect at-risk students, thereby enabling early interventions for...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-01-01
|
Series: | Big Data and Cognitive Computing |
Subjects: | |
Online Access: | https://www.mdpi.com/2504-2289/6/1/6 |
_version_ | 1797447126711533568 |
---|---|
author | Gomathy Ramaswami Teo Susnjak Anuradha Mathrani |
author_facet | Gomathy Ramaswami Teo Susnjak Anuradha Mathrani |
author_sort | Gomathy Ramaswami |
collection | DOAJ |
description | Poor academic performance of students is a concern in the educational sector, especially if it leads to students being unable to meet minimum course requirements. However, with timely prediction of students’ performance, educators can detect at-risk students, thereby enabling early interventions for supporting these students in overcoming their learning difficulties. However, the majority of studies have taken the approach of developing individual models that target a single course while developing prediction models. These models are tailored to specific attributes of each course amongst a very diverse set of possibilities. While this approach can yield accurate models in some instances, this strategy is associated with limitations. In many cases, overfitting can take place when course data is small or when new courses are devised. Additionally, maintaining a large suite of models per course is a significant overhead. This issue can be tackled by developing a generic and course-agnostic predictive model that captures more abstract patterns and is able to operate across all courses, irrespective of their differences. This study demonstrates how a generic predictive model can be developed that identifies at-risk students across a wide variety of courses. Experiments were conducted using a range of algorithms, with the generic model producing an effective accuracy. The findings showed that the CatBoost algorithm performed the best on our dataset across the F-measure, ROC (receiver operating characteristic) curve and AUC scores; therefore, it is an excellent candidate algorithm for providing solutions on this domain given its capabilities to seamlessly handle categorical and missing data, which is frequently a feature in educational datasets. |
first_indexed | 2024-03-09T13:51:20Z |
format | Article |
id | doaj.art-a1245f6d664d4576b59228faac45fc07 |
institution | Directory Open Access Journal |
issn | 2504-2289 |
language | English |
last_indexed | 2024-03-09T13:51:20Z |
publishDate | 2022-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Big Data and Cognitive Computing |
spelling | doaj.art-a1245f6d664d4576b59228faac45fc072023-11-30T20:50:43ZengMDPI AGBig Data and Cognitive Computing2504-22892022-01-0161610.3390/bdcc6010006On Developing Generic Models for Predicting Student Outcomes in Educational Data MiningGomathy Ramaswami0Teo Susnjak1Anuradha Mathrani2School of Mathematical and Computational Sciences, Massey University, Auckland 0632, New ZealandSchool of Mathematical and Computational Sciences, Massey University, Auckland 0632, New ZealandSchool of Mathematical and Computational Sciences, Massey University, Auckland 0632, New ZealandPoor academic performance of students is a concern in the educational sector, especially if it leads to students being unable to meet minimum course requirements. However, with timely prediction of students’ performance, educators can detect at-risk students, thereby enabling early interventions for supporting these students in overcoming their learning difficulties. However, the majority of studies have taken the approach of developing individual models that target a single course while developing prediction models. These models are tailored to specific attributes of each course amongst a very diverse set of possibilities. While this approach can yield accurate models in some instances, this strategy is associated with limitations. In many cases, overfitting can take place when course data is small or when new courses are devised. Additionally, maintaining a large suite of models per course is a significant overhead. This issue can be tackled by developing a generic and course-agnostic predictive model that captures more abstract patterns and is able to operate across all courses, irrespective of their differences. This study demonstrates how a generic predictive model can be developed that identifies at-risk students across a wide variety of courses. Experiments were conducted using a range of algorithms, with the generic model producing an effective accuracy. The findings showed that the CatBoost algorithm performed the best on our dataset across the F-measure, ROC (receiver operating characteristic) curve and AUC scores; therefore, it is an excellent candidate algorithm for providing solutions on this domain given its capabilities to seamlessly handle categorical and missing data, which is frequently a feature in educational datasets.https://www.mdpi.com/2504-2289/6/1/6machine learningearly predictionCatBoostat-risk studentseducational data mining |
spellingShingle | Gomathy Ramaswami Teo Susnjak Anuradha Mathrani On Developing Generic Models for Predicting Student Outcomes in Educational Data Mining Big Data and Cognitive Computing machine learning early prediction CatBoost at-risk students educational data mining |
title | On Developing Generic Models for Predicting Student Outcomes in Educational Data Mining |
title_full | On Developing Generic Models for Predicting Student Outcomes in Educational Data Mining |
title_fullStr | On Developing Generic Models for Predicting Student Outcomes in Educational Data Mining |
title_full_unstemmed | On Developing Generic Models for Predicting Student Outcomes in Educational Data Mining |
title_short | On Developing Generic Models for Predicting Student Outcomes in Educational Data Mining |
title_sort | on developing generic models for predicting student outcomes in educational data mining |
topic | machine learning early prediction CatBoost at-risk students educational data mining |
url | https://www.mdpi.com/2504-2289/6/1/6 |
work_keys_str_mv | AT gomathyramaswami ondevelopinggenericmodelsforpredictingstudentoutcomesineducationaldatamining AT teosusnjak ondevelopinggenericmodelsforpredictingstudentoutcomesineducationaldatamining AT anuradhamathrani ondevelopinggenericmodelsforpredictingstudentoutcomesineducationaldatamining |