An Improvised SIMPLS Estimator Based on MRCD-PCA Weighting Function and Its Application to Real Data
Multicollinearity often occurs when two or more predictor variables are correlated, especially for high dimensional data (HDD) where <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>p</mi>&l...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-11-01
|
Series: | Symmetry |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-8994/13/11/2211 |
_version_ | 1797508382835343360 |
---|---|
author | Siti Zahariah Habshah Midi Mohd Shafie Mustafa |
author_facet | Siti Zahariah Habshah Midi Mohd Shafie Mustafa |
author_sort | Siti Zahariah |
collection | DOAJ |
description | Multicollinearity often occurs when two or more predictor variables are correlated, especially for high dimensional data (HDD) where <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>p</mi><mo>></mo><mo>></mo><mi>n</mi></mrow></semantics></math></inline-formula>. The statistically inspired modification of the partial least squares (SIMPLS) is a very popular technique for solving a partial least squares regression problem due to its efficiency, speed, and ease of understanding. The execution of SIMPLS is based on the empirical covariance matrix of explanatory variables and response variables. Nevertheless, SIMPLS is very easily affected by outliers. In order to rectify this problem, a robust iteratively reweighted SIMPLS (RWSIMPLS) is introduced. Nonetheless, it is still not very efficient as the algorithm of RWSIMPLS is based on a weighting function that does not specify any method of identification of high leverage points (HLPs), i.e., outlying observations in the <i>X</i>-direction. HLPs have the most detrimental effect on the computed values of various estimates, which results in misleading conclusions about the fitted regression model. Hence, their effects need to be reduced by assigning smaller weights to them. As a solution to this problem, we propose an improvised SIMPLS based on a new weight function obtained from the MRCD-PCA diagnostic method of the identification of HLPs for HDD and name this method MRCD-PCA-RWSIMPLS. A new MRCD-PCA-RWSIMPLS diagnostic plot is also established for classifying observations into four data points, i.e., regular observations, vertical outliers, and good and bad leverage points. The numerical examples and Monte Carlo simulations signify that MRCD-PCA-RWSIMPLS offers substantial improvements over SIMPLS and RWSIMPLS. The proposed diagnostic plot is able to classify observations into correct groups. On the contrary, SIMPLS and RWSIMPLS plots fail to correctly classify observations into correct groups and show masking and swamping effects. |
first_indexed | 2024-03-10T05:01:21Z |
format | Article |
id | doaj.art-9c469f28d9f54a4da0799efc21d4ac98 |
institution | Directory Open Access Journal |
issn | 2073-8994 |
language | English |
last_indexed | 2024-03-10T05:01:21Z |
publishDate | 2021-11-01 |
publisher | MDPI AG |
record_format | Article |
series | Symmetry |
spelling | doaj.art-9c469f28d9f54a4da0799efc21d4ac982023-11-23T01:47:03ZengMDPI AGSymmetry2073-89942021-11-011311221110.3390/sym13112211An Improvised SIMPLS Estimator Based on MRCD-PCA Weighting Function and Its Application to Real DataSiti Zahariah0Habshah Midi1Mohd Shafie Mustafa2Institute for Mathematical Research, Universiti Putra Malaysia, Serdang 43400, Selangor, MalaysiaInstitute for Mathematical Research, Universiti Putra Malaysia, Serdang 43400, Selangor, MalaysiaDepartment of Mathematics and Statistics, Faculty of Science, Universiti Putra Malaysia, Serdang 43400, Selangor, MalaysiaMulticollinearity often occurs when two or more predictor variables are correlated, especially for high dimensional data (HDD) where <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>p</mi><mo>></mo><mo>></mo><mi>n</mi></mrow></semantics></math></inline-formula>. The statistically inspired modification of the partial least squares (SIMPLS) is a very popular technique for solving a partial least squares regression problem due to its efficiency, speed, and ease of understanding. The execution of SIMPLS is based on the empirical covariance matrix of explanatory variables and response variables. Nevertheless, SIMPLS is very easily affected by outliers. In order to rectify this problem, a robust iteratively reweighted SIMPLS (RWSIMPLS) is introduced. Nonetheless, it is still not very efficient as the algorithm of RWSIMPLS is based on a weighting function that does not specify any method of identification of high leverage points (HLPs), i.e., outlying observations in the <i>X</i>-direction. HLPs have the most detrimental effect on the computed values of various estimates, which results in misleading conclusions about the fitted regression model. Hence, their effects need to be reduced by assigning smaller weights to them. As a solution to this problem, we propose an improvised SIMPLS based on a new weight function obtained from the MRCD-PCA diagnostic method of the identification of HLPs for HDD and name this method MRCD-PCA-RWSIMPLS. A new MRCD-PCA-RWSIMPLS diagnostic plot is also established for classifying observations into four data points, i.e., regular observations, vertical outliers, and good and bad leverage points. The numerical examples and Monte Carlo simulations signify that MRCD-PCA-RWSIMPLS offers substantial improvements over SIMPLS and RWSIMPLS. The proposed diagnostic plot is able to classify observations into correct groups. On the contrary, SIMPLS and RWSIMPLS plots fail to correctly classify observations into correct groups and show masking and swamping effects.https://www.mdpi.com/2073-8994/13/11/2211high dimensional datahigh leverage pointminimum regularized covariance determinantpartial least squares regressionprincipal component analysisSIMPLS |
spellingShingle | Siti Zahariah Habshah Midi Mohd Shafie Mustafa An Improvised SIMPLS Estimator Based on MRCD-PCA Weighting Function and Its Application to Real Data Symmetry high dimensional data high leverage point minimum regularized covariance determinant partial least squares regression principal component analysis SIMPLS |
title | An Improvised SIMPLS Estimator Based on MRCD-PCA Weighting Function and Its Application to Real Data |
title_full | An Improvised SIMPLS Estimator Based on MRCD-PCA Weighting Function and Its Application to Real Data |
title_fullStr | An Improvised SIMPLS Estimator Based on MRCD-PCA Weighting Function and Its Application to Real Data |
title_full_unstemmed | An Improvised SIMPLS Estimator Based on MRCD-PCA Weighting Function and Its Application to Real Data |
title_short | An Improvised SIMPLS Estimator Based on MRCD-PCA Weighting Function and Its Application to Real Data |
title_sort | improvised simpls estimator based on mrcd pca weighting function and its application to real data |
topic | high dimensional data high leverage point minimum regularized covariance determinant partial least squares regression principal component analysis SIMPLS |
url | https://www.mdpi.com/2073-8994/13/11/2211 |
work_keys_str_mv | AT sitizahariah animprovisedsimplsestimatorbasedonmrcdpcaweightingfunctionanditsapplicationtorealdata AT habshahmidi animprovisedsimplsestimatorbasedonmrcdpcaweightingfunctionanditsapplicationtorealdata AT mohdshafiemustafa animprovisedsimplsestimatorbasedonmrcdpcaweightingfunctionanditsapplicationtorealdata AT sitizahariah improvisedsimplsestimatorbasedonmrcdpcaweightingfunctionanditsapplicationtorealdata AT habshahmidi improvisedsimplsestimatorbasedonmrcdpcaweightingfunctionanditsapplicationtorealdata AT mohdshafiemustafa improvisedsimplsestimatorbasedonmrcdpcaweightingfunctionanditsapplicationtorealdata |