Extracting topological features to identify at-risk students using machine learning and graph convolutional network models

Abstract Technological advances have significantly affected education, leading to the creation of online learning platforms such as virtual learning environments and massive open online courses. While these platforms offer a variety of features, none of them incorporates a module that accurately pre...

Full description

Bibliographic Details
Main Authors: Balqis Albreiki, Tetiana Habuza, Nazar Zaki
Format: Article
Language:English
Published: SpringerOpen 2023-04-01
Series:International Journal of Educational Technology in Higher Education
Subjects:
Online Access:https://doi.org/10.1186/s41239-023-00389-3
_version_ 1797784493566722048
author Balqis Albreiki
Tetiana Habuza
Nazar Zaki
author_facet Balqis Albreiki
Tetiana Habuza
Nazar Zaki
author_sort Balqis Albreiki
collection DOAJ
description Abstract Technological advances have significantly affected education, leading to the creation of online learning platforms such as virtual learning environments and massive open online courses. While these platforms offer a variety of features, none of them incorporates a module that accurately predicts students’ academic performance and commitment. Consequently, it is crucial to design machine learning (ML) methods that predict student performance and identify at-risk students as early as possible. Graph representations of student data provide new insights into this area. This paper describes a simple but highly accurate technique for converting tabulated data into graphs. We employ distance measures (Euclidean and cosine) to calculate the similarities between students’ data and construct a graph. We extract graph topological features (GF) to enhance our data. This allows us to capture structural correlations among the data and gain deeper insights than isolated data analysis. The initial dataset (DS) and GF can be used alone or jointly to improve the predictive power of the ML method. The proposed method is tested on an educational dataset and returns superior results. The use of DS alone is compared with the use of $$DS + GF$$ D S + G F in the classification of students into three classes: “failed”,“at risk”, and “good”. The area under the receiver operating characteristic curve (AUC) reaches 0.948 using DS, compared with 0.964 for $$DS + GF$$ D S + G F . The accuracy in the case of $$DS + GF$$ D S + G F varies from 84.5 to 87.3%. Adding GF improves the performance by 2.019% in terms of AUC and 3.261% in terms of accuracy. Moreover, by incorporating graph topological features through a graph convolutional network (GCN), the prediction performance can be enhanced by 0.5% in terms of accuracy and 0.9% in terms of AUC under the cosine distance matrix. With the Euclidean distance matrix, adding the GCN improves the prediction accuracy by 3.7% and the AUC by 2.4%. By adding graph embedding features to ML models, at-risk students can be identified with 87.4% accuracy and 0.97 AUC. The proposed solution provides a tool for the early detection of at-risk students. This will benefit universities and enhance their prediction performance, improving both effectiveness and reputation.
first_indexed 2024-03-13T00:40:41Z
format Article
id doaj.art-75149ce9745d4b8dae95cfb10cc72810
institution Directory Open Access Journal
issn 2365-9440
language English
last_indexed 2024-03-13T00:40:41Z
publishDate 2023-04-01
publisher SpringerOpen
record_format Article
series International Journal of Educational Technology in Higher Education
spelling doaj.art-75149ce9745d4b8dae95cfb10cc728102023-07-09T11:21:24ZengSpringerOpenInternational Journal of Educational Technology in Higher Education2365-94402023-04-0120112210.1186/s41239-023-00389-3Extracting topological features to identify at-risk students using machine learning and graph convolutional network modelsBalqis Albreiki0Tetiana Habuza1Nazar Zaki2Department of Computer Science and Software Engineering, College of Information Technology, UAE UniversityDepartment of Computer Science and Software Engineering, College of Information Technology, UAE UniversityDepartment of Computer Science and Software Engineering, College of Information Technology, UAE UniversityAbstract Technological advances have significantly affected education, leading to the creation of online learning platforms such as virtual learning environments and massive open online courses. While these platforms offer a variety of features, none of them incorporates a module that accurately predicts students’ academic performance and commitment. Consequently, it is crucial to design machine learning (ML) methods that predict student performance and identify at-risk students as early as possible. Graph representations of student data provide new insights into this area. This paper describes a simple but highly accurate technique for converting tabulated data into graphs. We employ distance measures (Euclidean and cosine) to calculate the similarities between students’ data and construct a graph. We extract graph topological features (GF) to enhance our data. This allows us to capture structural correlations among the data and gain deeper insights than isolated data analysis. The initial dataset (DS) and GF can be used alone or jointly to improve the predictive power of the ML method. The proposed method is tested on an educational dataset and returns superior results. The use of DS alone is compared with the use of $$DS + GF$$ D S + G F in the classification of students into three classes: “failed”,“at risk”, and “good”. The area under the receiver operating characteristic curve (AUC) reaches 0.948 using DS, compared with 0.964 for $$DS + GF$$ D S + G F . The accuracy in the case of $$DS + GF$$ D S + G F varies from 84.5 to 87.3%. Adding GF improves the performance by 2.019% in terms of AUC and 3.261% in terms of accuracy. Moreover, by incorporating graph topological features through a graph convolutional network (GCN), the prediction performance can be enhanced by 0.5% in terms of accuracy and 0.9% in terms of AUC under the cosine distance matrix. With the Euclidean distance matrix, adding the GCN improves the prediction accuracy by 3.7% and the AUC by 2.4%. By adding graph embedding features to ML models, at-risk students can be identified with 87.4% accuracy and 0.97 AUC. The proposed solution provides a tool for the early detection of at-risk students. This will benefit universities and enhance their prediction performance, improving both effectiveness and reputation.https://doi.org/10.1186/s41239-023-00389-3Student performanceGraph representationStudents at riskGraph topological featureGraph embeddingGraph convolutional network
spellingShingle Balqis Albreiki
Tetiana Habuza
Nazar Zaki
Extracting topological features to identify at-risk students using machine learning and graph convolutional network models
International Journal of Educational Technology in Higher Education
Student performance
Graph representation
Students at risk
Graph topological feature
Graph embedding
Graph convolutional network
title Extracting topological features to identify at-risk students using machine learning and graph convolutional network models
title_full Extracting topological features to identify at-risk students using machine learning and graph convolutional network models
title_fullStr Extracting topological features to identify at-risk students using machine learning and graph convolutional network models
title_full_unstemmed Extracting topological features to identify at-risk students using machine learning and graph convolutional network models
title_short Extracting topological features to identify at-risk students using machine learning and graph convolutional network models
title_sort extracting topological features to identify at risk students using machine learning and graph convolutional network models
topic Student performance
Graph representation
Students at risk
Graph topological feature
Graph embedding
Graph convolutional network
url https://doi.org/10.1186/s41239-023-00389-3
work_keys_str_mv AT balqisalbreiki extractingtopologicalfeaturestoidentifyatriskstudentsusingmachinelearningandgraphconvolutionalnetworkmodels
AT tetianahabuza extractingtopologicalfeaturestoidentifyatriskstudentsusingmachinelearningandgraphconvolutionalnetworkmodels
AT nazarzaki extractingtopologicalfeaturestoidentifyatriskstudentsusingmachinelearningandgraphconvolutionalnetworkmodels