Ridge Regression and the Elastic Net: How Do They Do as Finders of True Regressors and Their Coefficients?

For the linear model <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>Y</mi><mo>=</mo><mi>X</mi><mi>b</mi><mo>+</mo><mi>e</mi><...

Full description

Bibliographic Details
Main Author: Rajaram Gana
Format: Article
Language:English
Published: MDPI AG 2022-08-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/10/17/3057
Description
Summary:For the linear model <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>Y</mi><mo>=</mo><mi>X</mi><mi>b</mi><mo>+</mo><mi>e</mi><mi>r</mi><mi>r</mi><mi>o</mi><mi>r</mi></mrow></semantics></math></inline-formula>, where the number of regressors (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>p</mi></semantics></math></inline-formula>) exceeds the number of observations (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>n</mi></semantics></math></inline-formula>), the Elastic Net (EN) was proposed, in 2005, to estimate <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>b</mi></semantics></math></inline-formula>. The EN uses <i>both</i> the Lasso, proposed in 1996, and ordinary Ridge Regression (RR), proposed in 1970, to estimate <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>b</mi></semantics></math></inline-formula>. However, when <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>p</mi><mo>></mo><mi>n</mi></mrow></semantics></math></inline-formula>, using <i>only</i> RR to estimate <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>b</mi></semantics></math></inline-formula> has not been considered in the literature thus far. Because RR is based on the least-squares framework, only using RR to estimate <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>b</mi></semantics></math></inline-formula> is computationally much simpler than using the EN. We propose a generalized ridge regression (GRR) algorithm, a superior alternative to the EN, for estimating <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>b</mi></semantics></math></inline-formula> as follows: partition <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>X</mi></semantics></math></inline-formula> from left to right so that every partition, but the last one, has 3 observations per regressor; for each partition, we estimate <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>Y</mi></semantics></math></inline-formula> with the regressors in that partition using ordinary RR; retain the regressors with statistically significant <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>t</mi></semantics></math></inline-formula>-ratios and the corresponding RR tuning parameter <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>k</mi></semantics></math></inline-formula>, by partition; use the retained regressors and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>k</mi></semantics></math></inline-formula> values to re-estimate <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>Y</mi></semantics></math></inline-formula> by GRR across all partitions, which yields <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>b</mi></semantics></math></inline-formula>. Algorithmic efficacy is compared using 4 metrics by simulation, because the algorithm is mathematically intractable. Three metrics, with their probabilities of RR’s superiority over EN in parentheses, are: the proportion of true regressors discovered (99%); the squared distance, from the true coefficients, of the significant coefficients (86%); and the squared distance, from the true coefficients, of estimated coefficients that are both significant and true (74%). The fourth metric is the probability that none of the regressors discovered are true, which for RR and EN is 4% and 25%, respectively. This indicates the additional advantage RR has over the EN in terms of discovering causal regressors.
ISSN:2227-7390