Can We Predict Student Performance Based on Tabular and Textual Data?

With the emergence of more new teaching systems, such as Massive Open Online Courses (MOOCs), massive amounts of data are constantly being collected. There is a huge value in these massive teaching data. However, the data, including both student behavior data and student comment data about the cours...

Full description

Bibliographic Details
Main Authors:	Yubin Qu, Fang Li, Long Li, Xianzhen Dou, Hongmei Wang
Format:	Article
Language:	English
Published:	IEEE 2022-01-01
Series:	IEEE Access
Subjects:	Educational data mining deep learning multimodal data fusion random forest
Online Access:	https://ieeexplore.ieee.org/document/9856660/

_version_	1798037118615814144
author	Yubin Qu Fang Li Long Li Xianzhen Dou Hongmei Wang
author_facet	Yubin Qu Fang Li Long Li Xianzhen Dou Hongmei Wang
author_sort	Yubin Qu
collection	DOAJ
description	With the emergence of more new teaching systems, such as Massive Open Online Courses (MOOCs), massive amounts of data are constantly being collected. There is a huge value in these massive teaching data. However, the data, including both student behavior data and student comment data about the course, is not processed to discover models and paradigms which can be useful for school management. There is no multimodal dataset with tabular and textual data for educational data mining yet. We first collect a dataset that included student behavior data and course comments textual data. Then we fuse the student behavior data with course comments textual data to predict student performance, using a Transformer-based framework with a uniform vector representation. The empirical results of the collected dataset show the effectiveness of our proposed method. In terms of F1 and AUC the performance of our method improves by up to 3.33% and 4.37% respectively. We find that the uniform feature vector representation learned by our proposed method can indeed improve the classifier’s performance, compared with existing works. Further, we validate our approach on an open dataset. The results of the empirical study show that our proposed method has a strong generalization capability. Moreover, we perform interpretability analysis using the SHapley Additive exPlanation (SHAP) method and find that text features have a more important influence on the classification model. This further illustrates that fusing text features can improve the performance of classification models.
first_indexed	2024-04-11T21:22:14Z
format	Article
id	doaj.art-6cd818b05c284a0089bc2d312010f225
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-04-11T21:22:14Z
publishDate	2022-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-6cd818b05c284a0089bc2d312010f2252022-12-22T04:02:34ZengIEEEIEEE Access2169-35362022-01-0110860088601910.1109/ACCESS.2022.31986829856660Can We Predict Student Performance Based on Tabular and Textual Data?Yubin Qu0https://orcid.org/0000-0001-5222-4020Fang Li1Long Li2https://orcid.org/0000-0002-7693-9722Xianzhen Dou3Hongmei Wang4Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, ChinaSchool of Marxism, Jiangsu College of Engineering and Technology, Nantong, ChinaSchool of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, ChinaSchool of Information Engineering, Jiangsu College of Engineering and Technology, Nantong, ChinaSchool of Computer, Jiangsu University of Science and Technology, Zhenjiang, ChinaWith the emergence of more new teaching systems, such as Massive Open Online Courses (MOOCs), massive amounts of data are constantly being collected. There is a huge value in these massive teaching data. However, the data, including both student behavior data and student comment data about the course, is not processed to discover models and paradigms which can be useful for school management. There is no multimodal dataset with tabular and textual data for educational data mining yet. We first collect a dataset that included student behavior data and course comments textual data. Then we fuse the student behavior data with course comments textual data to predict student performance, using a Transformer-based framework with a uniform vector representation. The empirical results of the collected dataset show the effectiveness of our proposed method. In terms of F1 and AUC the performance of our method improves by up to 3.33% and 4.37% respectively. We find that the uniform feature vector representation learned by our proposed method can indeed improve the classifier’s performance, compared with existing works. Further, we validate our approach on an open dataset. The results of the empirical study show that our proposed method has a strong generalization capability. Moreover, we perform interpretability analysis using the SHapley Additive exPlanation (SHAP) method and find that text features have a more important influence on the classification model. This further illustrates that fusing text features can improve the performance of classification models.https://ieeexplore.ieee.org/document/9856660/Educational data miningdeep learningmultimodaldata fusionrandom forest
spellingShingle	Yubin Qu Fang Li Long Li Xianzhen Dou Hongmei Wang Can We Predict Student Performance Based on Tabular and Textual Data? IEEE Access Educational data mining deep learning multimodal data fusion random forest
title	Can We Predict Student Performance Based on Tabular and Textual Data?
title_full	Can We Predict Student Performance Based on Tabular and Textual Data?
title_fullStr	Can We Predict Student Performance Based on Tabular and Textual Data?
title_full_unstemmed	Can We Predict Student Performance Based on Tabular and Textual Data?
title_short	Can We Predict Student Performance Based on Tabular and Textual Data?
title_sort	can we predict student performance based on tabular and textual data
topic	Educational data mining deep learning multimodal data fusion random forest
url	https://ieeexplore.ieee.org/document/9856660/
work_keys_str_mv	AT yubinqu canwepredictstudentperformancebasedontabularandtextualdata AT fangli canwepredictstudentperformancebasedontabularandtextualdata AT longli canwepredictstudentperformancebasedontabularandtextualdata AT xianzhendou canwepredictstudentperformancebasedontabularandtextualdata AT hongmeiwang canwepredictstudentperformancebasedontabularandtextualdata

Can We Predict Student Performance Based on Tabular and Textual Data?

Similar Items