Practical Treatment of the Multicollinearity: The Optimal Ridge Method and the Modified OLS

The paper discusses the applicability of the two main methods for solving the linear regression (LR) problem in the presence of multicollinearity – the OLS and the ridge methods. We compare the solutions obtained by these methods with the solution calculated by the Modified OLS (MOLS) [1; 2]. Like t...

Full description

Bibliographic Details
Main Authors:	Tyzhnenko Alexander G., Ryeznik Yevgen V.
Format:	Article
Language:	English
Published:	PH "INZHEK" 2021-03-01
Series:	Problemi Ekonomiki
Subjects:	multicollinearity economic correctness economic adequacy modified cramer's rule modified ols optimal ridge regression
Online Access:	https://www.problecon.com/export_pdf/problems-of-economy-2021-1_0-pages-155_168.pdf

_version_	1797331486882398208
author	Tyzhnenko Alexander G. Ryeznik Yevgen V.
author_facet	Tyzhnenko Alexander G. Ryeznik Yevgen V.
author_sort	Tyzhnenko Alexander G.
collection	DOAJ
description	The paper discusses the applicability of the two main methods for solving the linear regression (LR) problem in the presence of multicollinearity – the OLS and the ridge methods. We compare the solutions obtained by these methods with the solution calculated by the Modified OLS (MOLS) [1; 2]. Like the ridge, the MOLS provides a stable solution for any level of data collinearity. We compare three approaches by using the Monte Carlo simulations, and the data used is generated by the Artificial Data Generator (ADG) [1; 2]. The ADG produces linear and nonlinear data samples of arbitrary size, which allows the investigation of the OLS equation's regularization problem. Two possible regularization versions are the COV version considered in [1; 2] and the ST version commonly used in the literature and practice. The performed investigations reveal that the ridge method in the COV version has an approximately constant optimal regularizer (λ_opt≈0.1) for any sample size and collinearity level. The MOLS method in this version also has an approximately constant optimal regularizer, but its value is significantly smaller (λ_opt≈0.001). On the contrary, the ridge method in the ST version has the optimal regularizer, which is not a constant but depends on the sample size. In this case, its value needs to be set to λ_opt≈0.1 (n-1). With such a value of the ridge parameter, the obtained solution is strictly the same as one obtained with the COV version but with the optimal regularizer λ_opt≈0.1 [1; 2]. With such a choice of the regularizer, one can use any implementation of the ridge method in all known statistical software by setting the regularization parameter λ_opt≈0.1(n-1) without extra tuning process regardless of the sample size and the collinearity level. Also, it is shown that such an optimal ridge(0.1) solution is close to the population solution for a sample size large enough, but, at the same time, it has some limitations. It is well known that the ridge(0.1) solution is biased. However, as it has been shown in the paper, the bias is economically insignificant. The more critical drawback, which is revealed, is the smoothing of the population solution – the ridge method significantly reduces the difference between the population regression coefficients. The ridge(0.1) method can result in a solution, which is economically correct, i.e., the regression coefficients have correct signs, but this solution might be inadequate to a certain extent. The more significant the difference between the regression coefficients in the population, the more inadequate is the ridge(0.1) method. As for the MOLS, it does not possess this disadvantage. Since its regularization constant is much smaller than the corresponding ridge regularizer (0.001 versus 0.1), the MOLS method suffers little from both the bias and smoothing of its solutions. From a practical point of view, both the ridge(0.1) and the MOLS methods result in close stable solutions to the LR problem for any sample size and collinearity level. With the sample size increasing, both solutions approach the population solution. We also demonstrate that for a small sample size of less than 40, the ridge(0.1) method is preferable, as it is more stable. When the sample size is medium or large, it is preferable to use the MOLS as it is more accurate yet has approximately the same stability.
first_indexed	2024-03-08T07:36:15Z
format	Article
id	doaj.art-7724b35acf73470996eb4fe7a3abd2e7
institution	Directory Open Access Journal
issn	2222-0712 2311-1186
language	English
last_indexed	2024-03-08T07:36:15Z
publishDate	2021-03-01
publisher	PH "INZHEK"
record_format	Article
series	Problemi Ekonomiki
spelling	doaj.art-7724b35acf73470996eb4fe7a3abd2e72024-02-02T18:56:21ZengPH "INZHEK"Problemi Ekonomiki2222-07122311-11862021-03-0114715516810.32983/2222-0712-2021-1-155-168Practical Treatment of the Multicollinearity: The Optimal Ridge Method and the Modified OLSTyzhnenko Alexander G.0https://orcid.org/0000-0001-8508-7341Ryeznik Yevgen V. 1https://orcid.org/0000-0003-2997-8566Simon Kuznets Kharkiv National University of EconomicsUppsala UniversityThe paper discusses the applicability of the two main methods for solving the linear regression (LR) problem in the presence of multicollinearity – the OLS and the ridge methods. We compare the solutions obtained by these methods with the solution calculated by the Modified OLS (MOLS) [1; 2]. Like the ridge, the MOLS provides a stable solution for any level of data collinearity. We compare three approaches by using the Monte Carlo simulations, and the data used is generated by the Artificial Data Generator (ADG) [1; 2]. The ADG produces linear and nonlinear data samples of arbitrary size, which allows the investigation of the OLS equation's regularization problem. Two possible regularization versions are the COV version considered in [1; 2] and the ST version commonly used in the literature and practice. The performed investigations reveal that the ridge method in the COV version has an approximately constant optimal regularizer (λ_opt≈0.1) for any sample size and collinearity level. The MOLS method in this version also has an approximately constant optimal regularizer, but its value is significantly smaller (λ_opt≈0.001). On the contrary, the ridge method in the ST version has the optimal regularizer, which is not a constant but depends on the sample size. In this case, its value needs to be set to λ_opt≈0.1 (n-1). With such a value of the ridge parameter, the obtained solution is strictly the same as one obtained with the COV version but with the optimal regularizer λ_opt≈0.1 [1; 2]. With such a choice of the regularizer, one can use any implementation of the ridge method in all known statistical software by setting the regularization parameter λ_opt≈0.1(n-1) without extra tuning process regardless of the sample size and the collinearity level. Also, it is shown that such an optimal ridge(0.1) solution is close to the population solution for a sample size large enough, but, at the same time, it has some limitations. It is well known that the ridge(0.1) solution is biased. However, as it has been shown in the paper, the bias is economically insignificant. The more critical drawback, which is revealed, is the smoothing of the population solution – the ridge method significantly reduces the difference between the population regression coefficients. The ridge(0.1) method can result in a solution, which is economically correct, i.e., the regression coefficients have correct signs, but this solution might be inadequate to a certain extent. The more significant the difference between the regression coefficients in the population, the more inadequate is the ridge(0.1) method. As for the MOLS, it does not possess this disadvantage. Since its regularization constant is much smaller than the corresponding ridge regularizer (0.001 versus 0.1), the MOLS method suffers little from both the bias and smoothing of its solutions. From a practical point of view, both the ridge(0.1) and the MOLS methods result in close stable solutions to the LR problem for any sample size and collinearity level. With the sample size increasing, both solutions approach the population solution. We also demonstrate that for a small sample size of less than 40, the ridge(0.1) method is preferable, as it is more stable. When the sample size is medium or large, it is preferable to use the MOLS as it is more accurate yet has approximately the same stability.https://www.problecon.com/export_pdf/problems-of-economy-2021-1_0-pages-155_168.pdfmulticollinearityeconomic correctnesseconomic adequacymodified cramer's rulemodified olsoptimal ridge regression
spellingShingle	Tyzhnenko Alexander G. Ryeznik Yevgen V. Practical Treatment of the Multicollinearity: The Optimal Ridge Method and the Modified OLS Problemi Ekonomiki multicollinearity economic correctness economic adequacy modified cramer's rule modified ols optimal ridge regression
title	Practical Treatment of the Multicollinearity: The Optimal Ridge Method and the Modified OLS
title_full	Practical Treatment of the Multicollinearity: The Optimal Ridge Method and the Modified OLS
title_fullStr	Practical Treatment of the Multicollinearity: The Optimal Ridge Method and the Modified OLS
title_full_unstemmed	Practical Treatment of the Multicollinearity: The Optimal Ridge Method and the Modified OLS
title_short	Practical Treatment of the Multicollinearity: The Optimal Ridge Method and the Modified OLS
title_sort	practical treatment of the multicollinearity the optimal ridge method and the modified ols
topic	multicollinearity economic correctness economic adequacy modified cramer's rule modified ols optimal ridge regression
url	https://www.problecon.com/export_pdf/problems-of-economy-2021-1_0-pages-155_168.pdf
work_keys_str_mv	AT tyzhnenkoalexanderg practicaltreatmentofthemulticollinearitytheoptimalridgemethodandthemodifiedols AT ryeznikyevgenv practicaltreatmentofthemulticollinearitytheoptimalridgemethodandthemodifiedols

Practical Treatment of the Multicollinearity: The Optimal Ridge Method and the Modified OLS

Similar Items