Summary: | One of the critical aspects in completing study in a virtual learning environment (VLE) is the student behavior when interacting with the system. However, in real cases, most of the student behavior data have imbalanced label distribution. This imbalanced dataset affects the model performance of machine learning algorithms significantly. This study attempts to examine several resampling methods such as random undersampling (RUS), oversampling with synthetic minority oversampling technique (SMOTE), and hybrid sampling (SMOTEENN) to resolve the imbalanced data issue. Several machine learning (ML) classifiers are employed to evaluate the efficiency of the resampling methods, including Naïve Bayes (NB), Logistic Regression (LR), and Random Forest (RF). The experiment results indicate that the performance of classifiers is improved utilizing more balanced dataset. Furthermore, the Random Forest classifier has accomplished the best result among all other models while using SMOTEENN as a resampling approach. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
|