Validation of machine learning ridge regression models using Monte Carlo, bootstrap, and variations in cross-validation
In recent years, there have been several calls by practitioners of machine learning to provide more guidelines on how to use its methods and techniques. For example, the current literature on resampling methods is confusing and sometimes contradictory; worse, there are sometimes no practical guideli...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
De Gruyter
2023-07-01
|
Series: | Journal of Intelligent Systems |
Subjects: | |
Online Access: | https://doi.org/10.1515/jisys-2022-0224 |
_version_ | 1797773478759235584 |
---|---|
author | Nakatsu Robbie T. |
author_facet | Nakatsu Robbie T. |
author_sort | Nakatsu Robbie T. |
collection | DOAJ |
description | In recent years, there have been several calls by practitioners of machine learning to provide more guidelines on how to use its methods and techniques. For example, the current literature on resampling methods is confusing and sometimes contradictory; worse, there are sometimes no practical guidelines offered at all. To address this shortcoming, a simulation study was conducted that evaluated ridge regression models fitted on five real-world datasets. The study compared the performance of four resampling methods, namely, Monte Carlo resampling, bootstrap, k-fold cross-validation, and repeated k-fold cross-validation. The goal was to find the best-fitting λ (regularization) parameter that would minimize mean squared error, by using nine variations of these resampling methods. For each of the nine resampling variations, 1,000 runs were performed to see how often a good fit, average fit, and poor fit λ value would be chosen. The resampling method that chose good fit values the greatest number of times was deemed the best method. Based on the results of the investigation, three general recommendations are made: (1) repeated k-fold cross-validation is the best method to select as a general-purpose resampling method; (2) k = 10 folds is a good choice in k-fold cross-validation; (3) Monte Carlo and bootstrap are underperformers, so they are not recommended as general-purpose resampling methods. At the same time, no resampling method was found to be uniformly better than the others. |
first_indexed | 2024-03-12T22:07:07Z |
format | Article |
id | doaj.art-70a24210b3b643c38114ad5c38190106 |
institution | Directory Open Access Journal |
issn | 2191-026X |
language | English |
last_indexed | 2024-03-12T22:07:07Z |
publishDate | 2023-07-01 |
publisher | De Gruyter |
record_format | Article |
series | Journal of Intelligent Systems |
spelling | doaj.art-70a24210b3b643c38114ad5c381901062023-07-24T11:18:52ZengDe GruyterJournal of Intelligent Systems2191-026X2023-07-01321556710.1515/jisys-2022-0224Validation of machine learning ridge regression models using Monte Carlo, bootstrap, and variations in cross-validationNakatsu Robbie T.0Department of Information Systems and Business Analytics, Loyola Marymount University, Los Angeles, CA 90045, USAIn recent years, there have been several calls by practitioners of machine learning to provide more guidelines on how to use its methods and techniques. For example, the current literature on resampling methods is confusing and sometimes contradictory; worse, there are sometimes no practical guidelines offered at all. To address this shortcoming, a simulation study was conducted that evaluated ridge regression models fitted on five real-world datasets. The study compared the performance of four resampling methods, namely, Monte Carlo resampling, bootstrap, k-fold cross-validation, and repeated k-fold cross-validation. The goal was to find the best-fitting λ (regularization) parameter that would minimize mean squared error, by using nine variations of these resampling methods. For each of the nine resampling variations, 1,000 runs were performed to see how often a good fit, average fit, and poor fit λ value would be chosen. The resampling method that chose good fit values the greatest number of times was deemed the best method. Based on the results of the investigation, three general recommendations are made: (1) repeated k-fold cross-validation is the best method to select as a general-purpose resampling method; (2) k = 10 folds is a good choice in k-fold cross-validation; (3) Monte Carlo and bootstrap are underperformers, so they are not recommended as general-purpose resampling methods. At the same time, no resampling method was found to be uniformly better than the others.https://doi.org/10.1515/jisys-2022-0224ridge regressionmachine learningmodel validationcross validationresampling methods |
spellingShingle | Nakatsu Robbie T. Validation of machine learning ridge regression models using Monte Carlo, bootstrap, and variations in cross-validation Journal of Intelligent Systems ridge regression machine learning model validation cross validation resampling methods |
title | Validation of machine learning ridge regression models using Monte Carlo, bootstrap, and variations in cross-validation |
title_full | Validation of machine learning ridge regression models using Monte Carlo, bootstrap, and variations in cross-validation |
title_fullStr | Validation of machine learning ridge regression models using Monte Carlo, bootstrap, and variations in cross-validation |
title_full_unstemmed | Validation of machine learning ridge regression models using Monte Carlo, bootstrap, and variations in cross-validation |
title_short | Validation of machine learning ridge regression models using Monte Carlo, bootstrap, and variations in cross-validation |
title_sort | validation of machine learning ridge regression models using monte carlo bootstrap and variations in cross validation |
topic | ridge regression machine learning model validation cross validation resampling methods |
url | https://doi.org/10.1515/jisys-2022-0224 |
work_keys_str_mv | AT nakatsurobbiet validationofmachinelearningridgeregressionmodelsusingmontecarlobootstrapandvariationsincrossvalidation |