Validation of machine learning ridge regression models using Monte Carlo, bootstrap, and variations in cross-validation

In recent years, there have been several calls by practitioners of machine learning to provide more guidelines on how to use its methods and techniques. For example, the current literature on resampling methods is confusing and sometimes contradictory; worse, there are sometimes no practical guideli...

Full description

Bibliographic Details
Main Author:	Nakatsu Robbie T.
Format:	Article
Language:	English
Published:	De Gruyter 2023-07-01
Series:	Journal of Intelligent Systems
Subjects:	ridge regression machine learning model validation cross validation resampling methods
Online Access:	https://doi.org/10.1515/jisys-2022-0224

_version_	1827894402051211264
author	Nakatsu Robbie T.
author_facet	Nakatsu Robbie T.
author_sort	Nakatsu Robbie T.
collection	DOAJ
description	In recent years, there have been several calls by practitioners of machine learning to provide more guidelines on how to use its methods and techniques. For example, the current literature on resampling methods is confusing and sometimes contradictory; worse, there are sometimes no practical guidelines offered at all. To address this shortcoming, a simulation study was conducted that evaluated ridge regression models fitted on five real-world datasets. The study compared the performance of four resampling methods, namely, Monte Carlo resampling, bootstrap, k-fold cross-validation, and repeated k-fold cross-validation. The goal was to find the best-fitting λ (regularization) parameter that would minimize mean squared error, by using nine variations of these resampling methods. For each of the nine resampling variations, 1,000 runs were performed to see how often a good fit, average fit, and poor fit λ value would be chosen. The resampling method that chose good fit values the greatest number of times was deemed the best method. Based on the results of the investigation, three general recommendations are made: (1) repeated k-fold cross-validation is the best method to select as a general-purpose resampling method; (2) k = 10 folds is a good choice in k-fold cross-validation; (3) Monte Carlo and bootstrap are underperformers, so they are not recommended as general-purpose resampling methods. At the same time, no resampling method was found to be uniformly better than the others.
first_indexed	2024-03-12T22:07:07Z
format	Article
id	doaj.art-70a24210b3b643c38114ad5c38190106
institution	Directory Open Access Journal
issn	2191-026X
language	English
last_indexed	2024-03-12T22:07:07Z
publishDate	2023-07-01
publisher	De Gruyter
record_format	Article
series	Journal of Intelligent Systems
spelling	doaj.art-70a24210b3b643c38114ad5c381901062023-07-24T11:18:52ZengDe GruyterJournal of Intelligent Systems2191-026X2023-07-01321556710.1515/jisys-2022-0224Validation of machine learning ridge regression models using Monte Carlo, bootstrap, and variations in cross-validationNakatsu Robbie T.0Department of Information Systems and Business Analytics, Loyola Marymount University, Los Angeles, CA 90045, USAIn recent years, there have been several calls by practitioners of machine learning to provide more guidelines on how to use its methods and techniques. For example, the current literature on resampling methods is confusing and sometimes contradictory; worse, there are sometimes no practical guidelines offered at all. To address this shortcoming, a simulation study was conducted that evaluated ridge regression models fitted on five real-world datasets. The study compared the performance of four resampling methods, namely, Monte Carlo resampling, bootstrap, k-fold cross-validation, and repeated k-fold cross-validation. The goal was to find the best-fitting λ (regularization) parameter that would minimize mean squared error, by using nine variations of these resampling methods. For each of the nine resampling variations, 1,000 runs were performed to see how often a good fit, average fit, and poor fit λ value would be chosen. The resampling method that chose good fit values the greatest number of times was deemed the best method. Based on the results of the investigation, three general recommendations are made: (1) repeated k-fold cross-validation is the best method to select as a general-purpose resampling method; (2) k = 10 folds is a good choice in k-fold cross-validation; (3) Monte Carlo and bootstrap are underperformers, so they are not recommended as general-purpose resampling methods. At the same time, no resampling method was found to be uniformly better than the others.https://doi.org/10.1515/jisys-2022-0224ridge regressionmachine learningmodel validationcross validationresampling methods
spellingShingle	Nakatsu Robbie T. Validation of machine learning ridge regression models using Monte Carlo, bootstrap, and variations in cross-validation Journal of Intelligent Systems ridge regression machine learning model validation cross validation resampling methods
title	Validation of machine learning ridge regression models using Monte Carlo, bootstrap, and variations in cross-validation
title_full	Validation of machine learning ridge regression models using Monte Carlo, bootstrap, and variations in cross-validation
title_fullStr	Validation of machine learning ridge regression models using Monte Carlo, bootstrap, and variations in cross-validation
title_full_unstemmed	Validation of machine learning ridge regression models using Monte Carlo, bootstrap, and variations in cross-validation
title_short	Validation of machine learning ridge regression models using Monte Carlo, bootstrap, and variations in cross-validation
title_sort	validation of machine learning ridge regression models using monte carlo bootstrap and variations in cross validation
topic	ridge regression machine learning model validation cross validation resampling methods
url	https://doi.org/10.1515/jisys-2022-0224
work_keys_str_mv	AT nakatsurobbiet validationofmachinelearningridgeregressionmodelsusingmontecarlobootstrapandvariationsincrossvalidation

Validation of machine learning ridge regression models using Monte Carlo, bootstrap, and variations in cross-validation

Similar Items