Early prediction of medical students' performance in high-stakes examinations using machine learning approaches

Introduction: Since the advent of medical education systems, managing high-stakes exams has been a top priority and challenge for all policymakers. However, considering machine learning (ML) techniques as a replacement for medical licensing examinations, particularly during crises such as the COVID-...

Full description

Bibliographic Details
Main Authors:	Haniye Mastour, Toktam Dehghani, Ehsan Moradi, Saeid Eslami
Format:	Article
Language:	English
Published:	Elsevier 2023-07-01
Series:	Heliyon
Subjects:	Data science applications in education Data mining Artificial intelligence Medical education Ensemble models
Online Access:	http://www.sciencedirect.com/science/article/pii/S2405844023054567

_version_	1797771455884165120
author	Haniye Mastour Toktam Dehghani Ehsan Moradi Saeid Eslami
author_facet	Haniye Mastour Toktam Dehghani Ehsan Moradi Saeid Eslami
author_sort	Haniye Mastour
collection	DOAJ
description	Introduction: Since the advent of medical education systems, managing high-stakes exams has been a top priority and challenge for all policymakers. However, considering machine learning (ML) techniques as a replacement for medical licensing examinations, particularly during crises such as the COVID-19 outbreak, could be an effective solution. This study uses ML models to develop a framework for predicting medical students' performance on high-stakes exams, such as the Comprehensive Medical Basic Sciences Examination (CMBSE). Material and methods: Prediction of students' status and score on high-stakes examinations faces several challenges, including an imbalanced number of failing and passing students, a large number of heterogeneous and complex features, and the need to identify at-risk and top-performing students. In this study, two major categories of ML approaches are compared: first, classic models (logistic regression (LR), support vector machine (SVM), and k-nearest neighbors (KNN)), and second, ensemble models (voting, bagging (BG), random forests (RF), adaptive boosting (ADA), extreme gradient boosting (XGB), and stacking). Results: To evaluate the models' discrimination ability, they are assessed using a real dataset containing information on medical students over a five-year period (n = 1005). The findings indicate that ensemble ML models demonstrate optimal performance in predicting CMBSE status (RF and stacking). Similarly, among the classic regressors, LR exhibited the highest root-mean-square deviation (RMSD) (0.134) and coefficient of determination (R2) (0.62), whereas the RF model had the highest RMSD (0.077) and R2 (0.80) overall. Furthermore, Anatomical Sciences, Biochemistry, Parasitology, and Entomology grade point average (GPA) and grades demonstrated the strongest positive correlation with the outcomes. Conclusion: Comparing classic and ensemble ML models revealed that ensemble models are superior to classic models. Therefore, the presented framework could be considered a suitable alternative for the CMBSE and other comparable medical licensing examinations.
first_indexed	2024-03-12T21:36:51Z
format	Article
id	doaj.art-cad065979c3d4b31a73120880ee04eba
institution	Directory Open Access Journal
issn	2405-8440
language	English
last_indexed	2024-03-12T21:36:51Z
publishDate	2023-07-01
publisher	Elsevier
record_format	Article
series	Heliyon
spelling	doaj.art-cad065979c3d4b31a73120880ee04eba2023-07-27T05:59:03ZengElsevierHeliyon2405-84402023-07-0197e18248Early prediction of medical students' performance in high-stakes examinations using machine learning approachesHaniye Mastour0Toktam Dehghani1Ehsan Moradi2Saeid Eslami3Department of Medical Education, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, IranDepartment of Medical Informatics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran; Corresponding author.Mashhad University of Medical Sciences, Mashhad, IranDepartment of Medical Informatics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran; Pharmaceutical Sciences Research Center, Institute of Pharmaceutical Technology, Mashhad University of Medical Sciences, Mashhad, IranIntroduction: Since the advent of medical education systems, managing high-stakes exams has been a top priority and challenge for all policymakers. However, considering machine learning (ML) techniques as a replacement for medical licensing examinations, particularly during crises such as the COVID-19 outbreak, could be an effective solution. This study uses ML models to develop a framework for predicting medical students' performance on high-stakes exams, such as the Comprehensive Medical Basic Sciences Examination (CMBSE). Material and methods: Prediction of students' status and score on high-stakes examinations faces several challenges, including an imbalanced number of failing and passing students, a large number of heterogeneous and complex features, and the need to identify at-risk and top-performing students. In this study, two major categories of ML approaches are compared: first, classic models (logistic regression (LR), support vector machine (SVM), and k-nearest neighbors (KNN)), and second, ensemble models (voting, bagging (BG), random forests (RF), adaptive boosting (ADA), extreme gradient boosting (XGB), and stacking). Results: To evaluate the models' discrimination ability, they are assessed using a real dataset containing information on medical students over a five-year period (n = 1005). The findings indicate that ensemble ML models demonstrate optimal performance in predicting CMBSE status (RF and stacking). Similarly, among the classic regressors, LR exhibited the highest root-mean-square deviation (RMSD) (0.134) and coefficient of determination (R2) (0.62), whereas the RF model had the highest RMSD (0.077) and R2 (0.80) overall. Furthermore, Anatomical Sciences, Biochemistry, Parasitology, and Entomology grade point average (GPA) and grades demonstrated the strongest positive correlation with the outcomes. Conclusion: Comparing classic and ensemble ML models revealed that ensemble models are superior to classic models. Therefore, the presented framework could be considered a suitable alternative for the CMBSE and other comparable medical licensing examinations.http://www.sciencedirect.com/science/article/pii/S2405844023054567Data science applications in educationData miningArtificial intelligenceMedical educationEnsemble models
spellingShingle	Haniye Mastour Toktam Dehghani Ehsan Moradi Saeid Eslami Early prediction of medical students' performance in high-stakes examinations using machine learning approaches Heliyon Data science applications in education Data mining Artificial intelligence Medical education Ensemble models
title	Early prediction of medical students' performance in high-stakes examinations using machine learning approaches
title_full	Early prediction of medical students' performance in high-stakes examinations using machine learning approaches
title_fullStr	Early prediction of medical students' performance in high-stakes examinations using machine learning approaches
title_full_unstemmed	Early prediction of medical students' performance in high-stakes examinations using machine learning approaches
title_short	Early prediction of medical students' performance in high-stakes examinations using machine learning approaches
title_sort	early prediction of medical students performance in high stakes examinations using machine learning approaches
topic	Data science applications in education Data mining Artificial intelligence Medical education Ensemble models
url	http://www.sciencedirect.com/science/article/pii/S2405844023054567
work_keys_str_mv	AT haniyemastour earlypredictionofmedicalstudentsperformanceinhighstakesexaminationsusingmachinelearningapproaches AT toktamdehghani earlypredictionofmedicalstudentsperformanceinhighstakesexaminationsusingmachinelearningapproaches AT ehsanmoradi earlypredictionofmedicalstudentsperformanceinhighstakesexaminationsusingmachinelearningapproaches AT saeideslami earlypredictionofmedicalstudentsperformanceinhighstakesexaminationsusingmachinelearningapproaches

Early prediction of medical students' performance in high-stakes examinations using machine learning approaches

Similar Items