Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables

The selection of a descriptor, X, is crucial for improving the interpretation and prediction accuracy of a regression model. In this study, the prediction accuracy of models constructed using the selected X was determined and the results of variable selection, according to the number of selected X a...

Full description

Bibliographic Details
Main Author: Hiromasa Kaneko
Format: Article
Language:English
Published: Elsevier 2021-06-01
Series:Heliyon
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2405844021014596
_version_ 1819145880072093696
author Hiromasa Kaneko
author_facet Hiromasa Kaneko
author_sort Hiromasa Kaneko
collection DOAJ
description The selection of a descriptor, X, is crucial for improving the interpretation and prediction accuracy of a regression model. In this study, the prediction accuracy of models constructed using the selected X was determined and the results of variable selection, according to the number of selected X and number of selected variables that are unrelated to an objective variable, such as activities and properties (y), were investigated to evaluate the variable or feature selection methods. Variable selection methods include least absolute shrinkage and selection operator, genetic algorithm-based partial least squares, genetic algorithm-based support vector regression, and Boruta. Several regression analysis methods were used to test the prediction accuracy of the model constructed using the selected X. The characteristics of each variable selection method were analyzed using eight datasets. The results showed that even when variables unrelated to y were selected by variable selection and the number of unrelated variables was the same as the number of the original variables, a regression model with good accuracy, which ignores the influence of such noise variables, can be constructed by applying various regression analysis methods. Additionally, the variables related to y must not to be deleted. These findings provide a basis for improving the variable selection methods.
first_indexed 2024-12-22T13:05:03Z
format Article
id doaj.art-ad4d2edcf0b14610b1254a80da21f96e
institution Directory Open Access Journal
issn 2405-8440
language English
last_indexed 2024-12-22T13:05:03Z
publishDate 2021-06-01
publisher Elsevier
record_format Article
series Heliyon
spelling doaj.art-ad4d2edcf0b14610b1254a80da21f96e2022-12-21T18:24:55ZengElsevierHeliyon2405-84402021-06-0176e07356Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variablesHiromasa Kaneko0Corresponding author.; Department of Applied Chemistry, School of Science and Technology, Meiji University, 1-1-1 Higashi-Mita, Tama-ku, Kawasaki, Kanagawa 214-8571, JapanThe selection of a descriptor, X, is crucial for improving the interpretation and prediction accuracy of a regression model. In this study, the prediction accuracy of models constructed using the selected X was determined and the results of variable selection, according to the number of selected X and number of selected variables that are unrelated to an objective variable, such as activities and properties (y), were investigated to evaluate the variable or feature selection methods. Variable selection methods include least absolute shrinkage and selection operator, genetic algorithm-based partial least squares, genetic algorithm-based support vector regression, and Boruta. Several regression analysis methods were used to test the prediction accuracy of the model constructed using the selected X. The characteristics of each variable selection method were analyzed using eight datasets. The results showed that even when variables unrelated to y were selected by variable selection and the number of unrelated variables was the same as the number of the original variables, a regression model with good accuracy, which ignores the influence of such noise variables, can be constructed by applying various regression analysis methods. Additionally, the variables related to y must not to be deleted. These findings provide a basis for improving the variable selection methods.http://www.sciencedirect.com/science/article/pii/S2405844021014596Variable selectionFeature selectionRegressionPredictive accuracyInterpretabilityQSPR
spellingShingle Hiromasa Kaneko
Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables
Heliyon
Variable selection
Feature selection
Regression
Predictive accuracy
Interpretability
QSPR
title Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables
title_full Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables
title_fullStr Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables
title_full_unstemmed Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables
title_short Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables
title_sort examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables
topic Variable selection
Feature selection
Regression
Predictive accuracy
Interpretability
QSPR
url http://www.sciencedirect.com/science/article/pii/S2405844021014596
work_keys_str_mv AT hiromasakaneko examiningvariableselectionmethodsforthepredictiveperformanceofregressionmodelsandtheproportionofselectedvariablesandselectedrandomvariables