Practical Treatment of the Multicollinearity: The Optimal Ridge Method and the Modified OLS

The paper discusses the applicability of the two main methods for solving the linear regression (LR) problem in the presence of multicollinearity – the OLS and the ridge methods. We compare the solutions obtained by these methods with the solution calculated by the Modified OLS (MOLS) [1; 2]. Like t...

Full description

Bibliographic Details
Main Authors: Tyzhnenko Alexander G., Ryeznik Yevgen V.
Format: Article
Language:English
Published: PH "INZHEK" 2021-03-01
Series:Problemi Ekonomiki
Subjects:
Online Access:https://www.problecon.com/export_pdf/problems-of-economy-2021-1_0-pages-155_168.pdf
_version_ 1797331486882398208
author Tyzhnenko Alexander G.
Ryeznik Yevgen V.
author_facet Tyzhnenko Alexander G.
Ryeznik Yevgen V.
author_sort Tyzhnenko Alexander G.
collection DOAJ
description The paper discusses the applicability of the two main methods for solving the linear regression (LR) problem in the presence of multicollinearity – the OLS and the ridge methods. We compare the solutions obtained by these methods with the solution calculated by the Modified OLS (MOLS) [1; 2]. Like the ridge, the MOLS provides a stable solution for any level of data collinearity. We compare three approaches by using the Monte Carlo simulations, and the data used is generated by the Artificial Data Generator (ADG) [1; 2]. The ADG produces linear and nonlinear data samples of arbitrary size, which allows the investigation of the OLS equation's regularization problem. Two possible regularization versions are the COV version considered in [1; 2] and the ST version commonly used in the literature and practice. The performed investigations reveal that the ridge method in the COV version has an approximately constant optimal regularizer (λ_opt≈0.1) for any sample size and collinearity level. The MOLS method in this version also has an approximately constant optimal regularizer, but its value is significantly smaller (λ_opt≈0.001). On the contrary, the ridge method in the ST version has the optimal regularizer, which is not a constant but depends on the sample size. In this case, its value needs to be set to λ_opt≈0.1 (n-1). With such a value of the ridge parameter, the obtained solution is strictly the same as one obtained with the COV version but with the optimal regularizer λ_opt≈0.1 [1; 2]. With such a choice of the regularizer, one can use any implementation of the ridge method in all known statistical software by setting the regularization parameter λ_opt≈0.1(n-1) without extra tuning process regardless of the sample size and the collinearity level. Also, it is shown that such an optimal ridge(0.1) solution is close to the population solution for a sample size large enough, but, at the same time, it has some limitations. It is well known that the ridge(0.1) solution is biased. However, as it has been shown in the paper, the bias is economically insignificant. The more critical drawback, which is revealed, is the smoothing of the population solution – the ridge method significantly reduces the difference between the population regression coefficients. The ridge(0.1) method can result in a solution, which is economically correct, i.e., the regression coefficients have correct signs, but this solution might be inadequate to a certain extent. The more significant the difference between the regression coefficients in the population, the more inadequate is the ridge(0.1) method. As for the MOLS, it does not possess this disadvantage. Since its regularization constant is much smaller than the corresponding ridge regularizer (0.001 versus 0.1), the MOLS method suffers little from both the bias and smoothing of its solutions. From a practical point of view, both the ridge(0.1) and the MOLS methods result in close stable solutions to the LR problem for any sample size and collinearity level. With the sample size increasing, both solutions approach the population solution. We also demonstrate that for a small sample size of less than 40, the ridge(0.1) method is preferable, as it is more stable. When the sample size is medium or large, it is preferable to use the MOLS as it is more accurate yet has approximately the same stability.
first_indexed 2024-03-08T07:36:15Z
format Article
id doaj.art-7724b35acf73470996eb4fe7a3abd2e7
institution Directory Open Access Journal
issn 2222-0712
2311-1186
language English
last_indexed 2024-03-08T07:36:15Z
publishDate 2021-03-01
publisher PH "INZHEK"
record_format Article
series Problemi Ekonomiki
spelling doaj.art-7724b35acf73470996eb4fe7a3abd2e72024-02-02T18:56:21ZengPH "INZHEK"Problemi Ekonomiki2222-07122311-11862021-03-0114715516810.32983/2222-0712-2021-1-155-168Practical Treatment of the Multicollinearity: The Optimal Ridge Method and the Modified OLSTyzhnenko Alexander G.0https://orcid.org/0000-0001-8508-7341Ryeznik Yevgen V. 1https://orcid.org/0000-0003-2997-8566Simon Kuznets Kharkiv National University of EconomicsUppsala UniversityThe paper discusses the applicability of the two main methods for solving the linear regression (LR) problem in the presence of multicollinearity – the OLS and the ridge methods. We compare the solutions obtained by these methods with the solution calculated by the Modified OLS (MOLS) [1; 2]. Like the ridge, the MOLS provides a stable solution for any level of data collinearity. We compare three approaches by using the Monte Carlo simulations, and the data used is generated by the Artificial Data Generator (ADG) [1; 2]. The ADG produces linear and nonlinear data samples of arbitrary size, which allows the investigation of the OLS equation's regularization problem. Two possible regularization versions are the COV version considered in [1; 2] and the ST version commonly used in the literature and practice. The performed investigations reveal that the ridge method in the COV version has an approximately constant optimal regularizer (λ_opt≈0.1) for any sample size and collinearity level. The MOLS method in this version also has an approximately constant optimal regularizer, but its value is significantly smaller (λ_opt≈0.001). On the contrary, the ridge method in the ST version has the optimal regularizer, which is not a constant but depends on the sample size. In this case, its value needs to be set to λ_opt≈0.1 (n-1). With such a value of the ridge parameter, the obtained solution is strictly the same as one obtained with the COV version but with the optimal regularizer λ_opt≈0.1 [1; 2]. With such a choice of the regularizer, one can use any implementation of the ridge method in all known statistical software by setting the regularization parameter λ_opt≈0.1(n-1) without extra tuning process regardless of the sample size and the collinearity level. Also, it is shown that such an optimal ridge(0.1) solution is close to the population solution for a sample size large enough, but, at the same time, it has some limitations. It is well known that the ridge(0.1) solution is biased. However, as it has been shown in the paper, the bias is economically insignificant. The more critical drawback, which is revealed, is the smoothing of the population solution – the ridge method significantly reduces the difference between the population regression coefficients. The ridge(0.1) method can result in a solution, which is economically correct, i.e., the regression coefficients have correct signs, but this solution might be inadequate to a certain extent. The more significant the difference between the regression coefficients in the population, the more inadequate is the ridge(0.1) method. As for the MOLS, it does not possess this disadvantage. Since its regularization constant is much smaller than the corresponding ridge regularizer (0.001 versus 0.1), the MOLS method suffers little from both the bias and smoothing of its solutions. From a practical point of view, both the ridge(0.1) and the MOLS methods result in close stable solutions to the LR problem for any sample size and collinearity level. With the sample size increasing, both solutions approach the population solution. We also demonstrate that for a small sample size of less than 40, the ridge(0.1) method is preferable, as it is more stable. When the sample size is medium or large, it is preferable to use the MOLS as it is more accurate yet has approximately the same stability.https://www.problecon.com/export_pdf/problems-of-economy-2021-1_0-pages-155_168.pdfmulticollinearityeconomic correctnesseconomic adequacymodified cramer's rulemodified olsoptimal ridge regression
spellingShingle Tyzhnenko Alexander G.
Ryeznik Yevgen V.
Practical Treatment of the Multicollinearity: The Optimal Ridge Method and the Modified OLS
Problemi Ekonomiki
multicollinearity
economic correctness
economic adequacy
modified cramer's rule
modified ols
optimal ridge regression
title Practical Treatment of the Multicollinearity: The Optimal Ridge Method and the Modified OLS
title_full Practical Treatment of the Multicollinearity: The Optimal Ridge Method and the Modified OLS
title_fullStr Practical Treatment of the Multicollinearity: The Optimal Ridge Method and the Modified OLS
title_full_unstemmed Practical Treatment of the Multicollinearity: The Optimal Ridge Method and the Modified OLS
title_short Practical Treatment of the Multicollinearity: The Optimal Ridge Method and the Modified OLS
title_sort practical treatment of the multicollinearity the optimal ridge method and the modified ols
topic multicollinearity
economic correctness
economic adequacy
modified cramer's rule
modified ols
optimal ridge regression
url https://www.problecon.com/export_pdf/problems-of-economy-2021-1_0-pages-155_168.pdf
work_keys_str_mv AT tyzhnenkoalexanderg practicaltreatmentofthemulticollinearitytheoptimalridgemethodandthemodifiedols
AT ryeznikyevgenv practicaltreatmentofthemulticollinearitytheoptimalridgemethodandthemodifiedols