LASSO and Elastic Net Tend to Over-Select Features

Machine learning methods have been a standard approach to select features that are associated with an outcome and to build a prediction model when the number of candidate features is large. LASSO is one of the most popular approaches to this end. The LASSO approach selects features with large regres...

Full description

Bibliographic Details
Main Authors:	Lu Liu, Junheng Gao, Georgia Beasley, Sin-Ho Jung
Format:	Article
Language:	English
Published:	MDPI AG 2023-08-01
Series:	Mathematics
Subjects:	logistic regression machine learning prediction model ROC curve variable selection
Online Access:	https://www.mdpi.com/2227-7390/11/17/3738

_version_	1797582226375835648
author	Lu Liu Junheng Gao Georgia Beasley Sin-Ho Jung
author_facet	Lu Liu Junheng Gao Georgia Beasley Sin-Ho Jung
author_sort	Lu Liu
collection	DOAJ
description	Machine learning methods have been a standard approach to select features that are associated with an outcome and to build a prediction model when the number of candidate features is large. LASSO is one of the most popular approaches to this end. The LASSO approach selects features with large regression estimates, rather than based on statistical significance, that are associated with the outcome by imposing an <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>L</mi><mn>1</mn></msub></semantics></math></inline-formula>-norm penalty to overcome the high dimensionality of the candidate features. As a result, LASSO may select insignificant features while possibly missing significant ones. Furthermore, from our experience, LASSO has been found to select too many features. By selecting features that are not associated with the outcome, we may have to spend more cost to collect and manage them in the future use of a fitted prediction model. Using the combination of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>L</mi><mn>1</mn></msub></semantics></math></inline-formula>- and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>L</mi><mn>2</mn></msub></semantics></math></inline-formula>-norm penalties, elastic net (EN) tends to select even more features than LASSO. The overly selected features that are not associated with the outcome act like white noise, so that the fitted prediction model may lose prediction accuracy. In this paper, we propose to use standard regression methods, without any penalizing approach, combined with a stepwise variable selection procedure to overcome these issues. Unlike LASSO and EN, this method selects features based on statistical significance. Through extensive simulations, we show that this maximum likelihood estimation-based method selects a very small number of features while maintaining a high prediction power, whereas LASSO and EN make a large number of false selections to result in loss of prediction accuracy. Contrary to LASSO and EN, the regression methods combined with a stepwise variable selection method is a standard statistical method, so that any biostatistician can use it to analyze high-dimensional data, even without advanced bioinformatics knowledge.
first_indexed	2024-03-10T23:18:00Z
format	Article
id	doaj.art-f6df4ebb2c3348b29bde0eccf43f9141
institution	Directory Open Access Journal
issn	2227-7390
language	English
last_indexed	2024-03-10T23:18:00Z
publishDate	2023-08-01
publisher	MDPI AG
record_format	Article
series	Mathematics
spelling	doaj.art-f6df4ebb2c3348b29bde0eccf43f91412023-11-19T08:31:28ZengMDPI AGMathematics2227-73902023-08-011117373810.3390/math11173738LASSO and Elastic Net Tend to Over-Select FeaturesLu Liu0Junheng Gao1Georgia Beasley2Sin-Ho Jung3Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27708, USADepartment of Biostatistics and Bioinformatics, Duke University, Durham, NC 27708, USADepartment of Surgery, Duke University Medical Center, Durham, NC 27710, USADepartment of Biostatistics and Bioinformatics, Duke University, Durham, NC 27708, USAMachine learning methods have been a standard approach to select features that are associated with an outcome and to build a prediction model when the number of candidate features is large. LASSO is one of the most popular approaches to this end. The LASSO approach selects features with large regression estimates, rather than based on statistical significance, that are associated with the outcome by imposing an <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>L</mi><mn>1</mn></msub></semantics></math></inline-formula>-norm penalty to overcome the high dimensionality of the candidate features. As a result, LASSO may select insignificant features while possibly missing significant ones. Furthermore, from our experience, LASSO has been found to select too many features. By selecting features that are not associated with the outcome, we may have to spend more cost to collect and manage them in the future use of a fitted prediction model. Using the combination of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>L</mi><mn>1</mn></msub></semantics></math></inline-formula>- and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mi>L</mi><mn>2</mn></msub></semantics></math></inline-formula>-norm penalties, elastic net (EN) tends to select even more features than LASSO. The overly selected features that are not associated with the outcome act like white noise, so that the fitted prediction model may lose prediction accuracy. In this paper, we propose to use standard regression methods, without any penalizing approach, combined with a stepwise variable selection procedure to overcome these issues. Unlike LASSO and EN, this method selects features based on statistical significance. Through extensive simulations, we show that this maximum likelihood estimation-based method selects a very small number of features while maintaining a high prediction power, whereas LASSO and EN make a large number of false selections to result in loss of prediction accuracy. Contrary to LASSO and EN, the regression methods combined with a stepwise variable selection method is a standard statistical method, so that any biostatistician can use it to analyze high-dimensional data, even without advanced bioinformatics knowledge.https://www.mdpi.com/2227-7390/11/17/3738logistic regressionmachine learningprediction modelROC curvevariable selection
spellingShingle	Lu Liu Junheng Gao Georgia Beasley Sin-Ho Jung LASSO and Elastic Net Tend to Over-Select Features Mathematics logistic regression machine learning prediction model ROC curve variable selection
title	LASSO and Elastic Net Tend to Over-Select Features
title_full	LASSO and Elastic Net Tend to Over-Select Features
title_fullStr	LASSO and Elastic Net Tend to Over-Select Features
title_full_unstemmed	LASSO and Elastic Net Tend to Over-Select Features
title_short	LASSO and Elastic Net Tend to Over-Select Features
title_sort	lasso and elastic net tend to over select features
topic	logistic regression machine learning prediction model ROC curve variable selection
url	https://www.mdpi.com/2227-7390/11/17/3738
work_keys_str_mv	AT luliu lassoandelasticnettendtooverselectfeatures AT junhenggao lassoandelasticnettendtooverselectfeatures AT georgiabeasley lassoandelasticnettendtooverselectfeatures AT sinhojung lassoandelasticnettendtooverselectfeatures

LASSO and Elastic Net Tend to Over-Select Features

Similar Items