Monte Carlo cross-validation for a study with binary outcome and limited sample size

Abstract Cross-validation (CV) is a resampling approach to evaluate machine learning models when sample size is limited. The number of all possible combinations of folds for the training data, known as CV rounds, are often very small in leave-one-out CV. Alternatively, Monte Carlo cross-validation (...

Full description

Bibliographic Details
Main Author: Guogen Shan
Format: Article
Language:English
Published: BMC 2022-10-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-022-02016-z
_version_ 1811249875030900736
author Guogen Shan
author_facet Guogen Shan
author_sort Guogen Shan
collection DOAJ
description Abstract Cross-validation (CV) is a resampling approach to evaluate machine learning models when sample size is limited. The number of all possible combinations of folds for the training data, known as CV rounds, are often very small in leave-one-out CV. Alternatively, Monte Carlo cross-validation (MCCV) can be performed with a flexible number of simulations when computational resources are feasible for a study with limited sample size. We conduct extensive simulation studies to compare accuracy between MCCV and CV with the same number of simulations for a study with binary outcome (e.g., disease progression or not). Accuracy of MCCV is generally higher than CV although the gain is small. They have similar performance when sample size is large. Meanwhile, MCCV is going to provide reliable performance metrics as the number of simulations increases. Two real examples are used to illustrate the comparison between MCCV and CV.
first_indexed 2024-04-12T15:55:09Z
format Article
id doaj.art-81d8d5c8fb5f40d5872cdfc037280136
institution Directory Open Access Journal
issn 1472-6947
language English
last_indexed 2024-04-12T15:55:09Z
publishDate 2022-10-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj.art-81d8d5c8fb5f40d5872cdfc0372801362022-12-22T03:26:24ZengBMCBMC Medical Informatics and Decision Making1472-69472022-10-0122111510.1186/s12911-022-02016-zMonte Carlo cross-validation for a study with binary outcome and limited sample sizeGuogen Shan0Department of Biostatistics, University of FloridaAbstract Cross-validation (CV) is a resampling approach to evaluate machine learning models when sample size is limited. The number of all possible combinations of folds for the training data, known as CV rounds, are often very small in leave-one-out CV. Alternatively, Monte Carlo cross-validation (MCCV) can be performed with a flexible number of simulations when computational resources are feasible for a study with limited sample size. We conduct extensive simulation studies to compare accuracy between MCCV and CV with the same number of simulations for a study with binary outcome (e.g., disease progression or not). Accuracy of MCCV is generally higher than CV although the gain is small. They have similar performance when sample size is large. Meanwhile, MCCV is going to provide reliable performance metrics as the number of simulations increases. Two real examples are used to illustrate the comparison between MCCV and CV.https://doi.org/10.1186/s12911-022-02016-zAlzheimer’s diseaseBinary outcomeCross-validationMachine learningMonte Carlo cross-validation
spellingShingle Guogen Shan
Monte Carlo cross-validation for a study with binary outcome and limited sample size
BMC Medical Informatics and Decision Making
Alzheimer’s disease
Binary outcome
Cross-validation
Machine learning
Monte Carlo cross-validation
title Monte Carlo cross-validation for a study with binary outcome and limited sample size
title_full Monte Carlo cross-validation for a study with binary outcome and limited sample size
title_fullStr Monte Carlo cross-validation for a study with binary outcome and limited sample size
title_full_unstemmed Monte Carlo cross-validation for a study with binary outcome and limited sample size
title_short Monte Carlo cross-validation for a study with binary outcome and limited sample size
title_sort monte carlo cross validation for a study with binary outcome and limited sample size
topic Alzheimer’s disease
Binary outcome
Cross-validation
Machine learning
Monte Carlo cross-validation
url https://doi.org/10.1186/s12911-022-02016-z
work_keys_str_mv AT guogenshan montecarlocrossvalidationforastudywithbinaryoutcomeandlimitedsamplesize