Robust variable selection methods for large- scale data in the presence of multicollinearity, autocorrelated errors and outliers

The robust correlation coefficient based on robust multivariate location and scatter matrix such as Fast Minimum Covariance Determinant (Fast MCD) is not feasible option for high dimensional data due to its time consuming procedure. To overcome this problem, robust adjusted Winsorization correlat...

Full description

Bibliographic Details
Main Author: Uraibi, Hassan S.
Format: Thesis
Language:English
Published: 2016
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/69762/1/IPM%202016%205%20-%20IR.pdf
_version_ 1796979002315898880
author Uraibi, Hassan S.
author_facet Uraibi, Hassan S.
author_sort Uraibi, Hassan S.
collection UPM
description The robust correlation coefficient based on robust multivariate location and scatter matrix such as Fast Minimum Covariance Determinant (Fast MCD) is not feasible option for high dimensional data due to its time consuming procedure. To overcome this problem, robust adjusted Winsorization correlation (Adj.Winso.cor) is put forward. Unfortunately, the Adj.Winso.cor yields very poor results in the presence of multivariate outliers. Hence, we propose robust multivariate correlation matrix based on Reweighted Fast Consistent and High breakdown (RFCH) estimator. The findings show that the RFCH.cor is more robust than the Adj.Winso.cor in the presence of multivariate outliers. Forward selection (FS) is very effective variable selection procedure for selecting a parsimonious subset of covariates from a large number of candidate covariates. However, FS is not robust to outliers. Robust forward selection method (FS.Winso) based on partial correlations which is derived from Maronna’s bivariate M-estimator of scatter matrix and adjusted Winsorization pairwise correlation are introduced in a literatures to overcome the problem of outliers. We develop Robust Forward Selection algorithm based on RFCH correlation coefficient (RFS.RFCH) because FS.Winso is not robust to multivariate outliers. The results of our study indicate that the RFS.RFCH is more efficient than the FS and FS.Winso. The existing Robust-LARS based on Winsorization correlation (RLARS-Winsor) has some drawbacks whereby it is not robust in the presence of multivariate outliers. Hence, Robust-LARS (RLARS-RFCH) based on √ consistent multivariate (RFCH) correlation matrix is developed. The proposed method is computationally efficient and its performance outperformed the RLARS-Winsor The algorithm of all possible subsets is greedy and it is inefficient and unstable in the presence of autocorrelated errors and outliers. To overcome the instability selection problem, a stability selection approach is put forward to enhance the performance of single-split variable selection method. Unfortunately, the classical stability selection procedure is very sensitive to outliers and serially correlated errors. The stability procedure based on RFCH estimator is therefore developed. The results of the study show that our propose Robust Multi Split based on RFCH successfully and consistently select the correct variables in the final model. Thus far, there is no variable selection procedure in literature that deal with the problem of high magnitude of multicollinearity in the presence of outliers. Hence, Robust Non- Grouped variable selection(RNGVS.RFCH) in the presence of high multicollinearity problem and outliers is developed. The results signify that our proposed RNGVS.RFCH method able to correctly select the important variables in the final model. Not much research is focused on the problem of large data in the presence of outliers and autocorrelated errors. In this situation, the existing Elastic-Net and RE-Net methods are not capable of selecting the important variables in the final model. Thus, a new method that we call before and after elastic-net (BAE-Net) regression is proposed. The Reweighted Multivariate Normal (RMVN) algorithm is incorporated in the algorithm of the BAE-Net. The BAE-Net is found to do a credible job in selecting the correct important variables in the final model.
first_indexed 2024-03-06T10:02:40Z
format Thesis
id upm.eprints-69762
institution Universiti Putra Malaysia
language English
last_indexed 2024-03-06T10:02:40Z
publishDate 2016
record_format dspace
spelling upm.eprints-697622019-10-29T06:54:25Z http://psasir.upm.edu.my/id/eprint/69762/ Robust variable selection methods for large- scale data in the presence of multicollinearity, autocorrelated errors and outliers Uraibi, Hassan S. The robust correlation coefficient based on robust multivariate location and scatter matrix such as Fast Minimum Covariance Determinant (Fast MCD) is not feasible option for high dimensional data due to its time consuming procedure. To overcome this problem, robust adjusted Winsorization correlation (Adj.Winso.cor) is put forward. Unfortunately, the Adj.Winso.cor yields very poor results in the presence of multivariate outliers. Hence, we propose robust multivariate correlation matrix based on Reweighted Fast Consistent and High breakdown (RFCH) estimator. The findings show that the RFCH.cor is more robust than the Adj.Winso.cor in the presence of multivariate outliers. Forward selection (FS) is very effective variable selection procedure for selecting a parsimonious subset of covariates from a large number of candidate covariates. However, FS is not robust to outliers. Robust forward selection method (FS.Winso) based on partial correlations which is derived from Maronna’s bivariate M-estimator of scatter matrix and adjusted Winsorization pairwise correlation are introduced in a literatures to overcome the problem of outliers. We develop Robust Forward Selection algorithm based on RFCH correlation coefficient (RFS.RFCH) because FS.Winso is not robust to multivariate outliers. The results of our study indicate that the RFS.RFCH is more efficient than the FS and FS.Winso. The existing Robust-LARS based on Winsorization correlation (RLARS-Winsor) has some drawbacks whereby it is not robust in the presence of multivariate outliers. Hence, Robust-LARS (RLARS-RFCH) based on √ consistent multivariate (RFCH) correlation matrix is developed. The proposed method is computationally efficient and its performance outperformed the RLARS-Winsor The algorithm of all possible subsets is greedy and it is inefficient and unstable in the presence of autocorrelated errors and outliers. To overcome the instability selection problem, a stability selection approach is put forward to enhance the performance of single-split variable selection method. Unfortunately, the classical stability selection procedure is very sensitive to outliers and serially correlated errors. The stability procedure based on RFCH estimator is therefore developed. The results of the study show that our propose Robust Multi Split based on RFCH successfully and consistently select the correct variables in the final model. Thus far, there is no variable selection procedure in literature that deal with the problem of high magnitude of multicollinearity in the presence of outliers. Hence, Robust Non- Grouped variable selection(RNGVS.RFCH) in the presence of high multicollinearity problem and outliers is developed. The results signify that our proposed RNGVS.RFCH method able to correctly select the important variables in the final model. Not much research is focused on the problem of large data in the presence of outliers and autocorrelated errors. In this situation, the existing Elastic-Net and RE-Net methods are not capable of selecting the important variables in the final model. Thus, a new method that we call before and after elastic-net (BAE-Net) regression is proposed. The Reweighted Multivariate Normal (RMVN) algorithm is incorporated in the algorithm of the BAE-Net. The BAE-Net is found to do a credible job in selecting the correct important variables in the final model. 2016-06 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/69762/1/IPM%202016%205%20-%20IR.pdf Uraibi, Hassan S. (2016) Robust variable selection methods for large- scale data in the presence of multicollinearity, autocorrelated errors and outliers. Doctoral thesis, Universiti Putra Malaysia. Robust statistics Outliers (Statistics) Multicollinearity
spellingShingle Robust statistics
Outliers (Statistics)
Multicollinearity
Uraibi, Hassan S.
Robust variable selection methods for large- scale data in the presence of multicollinearity, autocorrelated errors and outliers
title Robust variable selection methods for large- scale data in the presence of multicollinearity, autocorrelated errors and outliers
title_full Robust variable selection methods for large- scale data in the presence of multicollinearity, autocorrelated errors and outliers
title_fullStr Robust variable selection methods for large- scale data in the presence of multicollinearity, autocorrelated errors and outliers
title_full_unstemmed Robust variable selection methods for large- scale data in the presence of multicollinearity, autocorrelated errors and outliers
title_short Robust variable selection methods for large- scale data in the presence of multicollinearity, autocorrelated errors and outliers
title_sort robust variable selection methods for large scale data in the presence of multicollinearity autocorrelated errors and outliers
topic Robust statistics
Outliers (Statistics)
Multicollinearity
url http://psasir.upm.edu.my/id/eprint/69762/1/IPM%202016%205%20-%20IR.pdf
work_keys_str_mv AT uraibihassans robustvariableselectionmethodsforlargescaledatainthepresenceofmulticollinearityautocorrelatederrorsandoutliers