Detection of Influential Observations in Spatial Regression Model Based on Outliers and Bad Leverage Classification

Influential observations (IOs), which are outliers in the x direction, y direction or both, remain a problem in the classical regression model fitting. Spatial regression models have a peculiar kind of outliers because they are local in nature. Spatial regression models are also not free from the ef...

Full description

Bibliographic Details
Main Authors: Ali Mohammed Baba, Habshah Midi, Mohd Bakri Adam, Nur Haizum Abd Rahman
Format: Article
Language:English
Published: MDPI AG 2021-10-01
Series:Symmetry
Subjects:
Online Access:https://www.mdpi.com/2073-8994/13/11/2030
_version_ 1797508356367187968
author Ali Mohammed Baba
Habshah Midi
Mohd Bakri Adam
Nur Haizum Abd Rahman
author_facet Ali Mohammed Baba
Habshah Midi
Mohd Bakri Adam
Nur Haizum Abd Rahman
author_sort Ali Mohammed Baba
collection DOAJ
description Influential observations (IOs), which are outliers in the x direction, y direction or both, remain a problem in the classical regression model fitting. Spatial regression models have a peculiar kind of outliers because they are local in nature. Spatial regression models are also not free from the effect of influential observations. Researchers have adapted some classical regression techniques to spatial models and obtained satisfactory results. However, masking or/and swamping remains a stumbling block for such methods. In this article, we obtain a measure of spatial Studentized prediction residuals that incorporate spatial information on the dependent variable and the residuals. We propose a robust spatial diagnostic plot to classify observations into regular observations, vertical outliers, good and bad leverage points using a classification based on spatial Studentized prediction residuals and spatial diagnostic potentials, which we refer to as <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>I</mi><mi>S</mi><mi>R</mi><mi>s</mi><mo>−</mo><msub><mi>P</mi><mrow><mi>o</mi><mi>s</mi><mi>i</mi></mrow></msub></mrow></semantics></math></inline-formula> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>E</mi><mi>S</mi><mi>R</mi><mi>s</mi><mo>−</mo><msub><mi>P</mi><mrow><mi>o</mi><mi>s</mi><mi>i</mi></mrow></msub></mrow></semantics></math></inline-formula>. Observations that fall into the vertical outliers and bad leverage points categories are referred to as IOs. Representations of some classical regression measures of diagnostic in general spatial models are presented. The commonly used diagnostic measure in spatial diagnostics, the Cook’s distance, is compared to some robust methods, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msubsup><mi>H</mi><mi>i</mi><mn>2</mn></msubsup></mrow></semantics></math></inline-formula> (using robust and non-robust measures), and our proposed <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>I</mi><mi>S</mi><mi>R</mi><mi>s</mi><mo>−</mo><msub><mi>P</mi><mrow><mi>o</mi><mi>s</mi><mi>i</mi></mrow></msub></mrow></semantics></math></inline-formula> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>E</mi><mi>S</mi><mi>R</mi><mi>s</mi><mo>−</mo><msub><mi>P</mi><mrow><mi>o</mi><mi>s</mi><mi>i</mi></mrow></msub></mrow></semantics></math></inline-formula> plots. Results of our simulation study and applications to real data showed that the Cook’s distance, non-robust <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msubsup><mi mathvariant="normal">H</mi><mrow><mi>si</mi><mn>1</mn></mrow><mn>2</mn></msubsup></mrow></semantics></math></inline-formula> and robust <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msubsup><mi>H</mi><mrow><mi>s</mi><mi>i</mi><mn>2</mn></mrow><mn>2</mn></msubsup></mrow></semantics></math></inline-formula> were not very successful in detecting IOs. The <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msubsup><mi mathvariant="normal">H</mi><mrow><mi>si</mi><mn>1</mn></mrow><mn>2</mn></msubsup></mrow></semantics></math></inline-formula> suffered from the masking effect, and the robust <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msubsup><mi>H</mi><mrow><mi>s</mi><mi>i</mi><mn>2</mn></mrow><mn>2</mn></msubsup></mrow></semantics></math></inline-formula> suffered from swamping in general spatial models. Interestingly, the results showed that the proposed <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>E</mi><mi>S</mi><mi>R</mi><mi>s</mi><mo>−</mo><msub><mi>P</mi><mrow><mi>o</mi><mi>s</mi><mi>i</mi></mrow></msub></mrow></semantics></math></inline-formula> plot, followed by the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>I</mi><mi>S</mi><mi>R</mi><mi>s</mi><mo>−</mo><msub><mi>P</mi><mrow><mi>o</mi><mi>s</mi><mi>i</mi></mrow></msub></mrow></semantics></math></inline-formula> plot, was very successful in classifying observations into the correct groups, hence correctly detecting the real IOs.
first_indexed 2024-03-10T05:00:59Z
format Article
id doaj.art-3bcd8390d89d4cc888ff60c819805569
institution Directory Open Access Journal
issn 2073-8994
language English
last_indexed 2024-03-10T05:00:59Z
publishDate 2021-10-01
publisher MDPI AG
record_format Article
series Symmetry
spelling doaj.art-3bcd8390d89d4cc888ff60c8198055692023-11-23T01:43:42ZengMDPI AGSymmetry2073-89942021-10-011311203010.3390/sym13112030Detection of Influential Observations in Spatial Regression Model Based on Outliers and Bad Leverage ClassificationAli Mohammed Baba0Habshah Midi1Mohd Bakri Adam2Nur Haizum Abd Rahman3Institute for Mathematical Research, Universiti Putra Malaysia, Serdang 43400, Selangor, MalaysiaInstitute for Mathematical Research, Universiti Putra Malaysia, Serdang 43400, Selangor, MalaysiaInstitute for Mathematical Research, Universiti Putra Malaysia, Serdang 43400, Selangor, MalaysiaInstitute for Mathematical Research, Universiti Putra Malaysia, Serdang 43400, Selangor, MalaysiaInfluential observations (IOs), which are outliers in the x direction, y direction or both, remain a problem in the classical regression model fitting. Spatial regression models have a peculiar kind of outliers because they are local in nature. Spatial regression models are also not free from the effect of influential observations. Researchers have adapted some classical regression techniques to spatial models and obtained satisfactory results. However, masking or/and swamping remains a stumbling block for such methods. In this article, we obtain a measure of spatial Studentized prediction residuals that incorporate spatial information on the dependent variable and the residuals. We propose a robust spatial diagnostic plot to classify observations into regular observations, vertical outliers, good and bad leverage points using a classification based on spatial Studentized prediction residuals and spatial diagnostic potentials, which we refer to as <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>I</mi><mi>S</mi><mi>R</mi><mi>s</mi><mo>−</mo><msub><mi>P</mi><mrow><mi>o</mi><mi>s</mi><mi>i</mi></mrow></msub></mrow></semantics></math></inline-formula> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>E</mi><mi>S</mi><mi>R</mi><mi>s</mi><mo>−</mo><msub><mi>P</mi><mrow><mi>o</mi><mi>s</mi><mi>i</mi></mrow></msub></mrow></semantics></math></inline-formula>. Observations that fall into the vertical outliers and bad leverage points categories are referred to as IOs. Representations of some classical regression measures of diagnostic in general spatial models are presented. The commonly used diagnostic measure in spatial diagnostics, the Cook’s distance, is compared to some robust methods, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msubsup><mi>H</mi><mi>i</mi><mn>2</mn></msubsup></mrow></semantics></math></inline-formula> (using robust and non-robust measures), and our proposed <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>I</mi><mi>S</mi><mi>R</mi><mi>s</mi><mo>−</mo><msub><mi>P</mi><mrow><mi>o</mi><mi>s</mi><mi>i</mi></mrow></msub></mrow></semantics></math></inline-formula> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>E</mi><mi>S</mi><mi>R</mi><mi>s</mi><mo>−</mo><msub><mi>P</mi><mrow><mi>o</mi><mi>s</mi><mi>i</mi></mrow></msub></mrow></semantics></math></inline-formula> plots. Results of our simulation study and applications to real data showed that the Cook’s distance, non-robust <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msubsup><mi mathvariant="normal">H</mi><mrow><mi>si</mi><mn>1</mn></mrow><mn>2</mn></msubsup></mrow></semantics></math></inline-formula> and robust <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msubsup><mi>H</mi><mrow><mi>s</mi><mi>i</mi><mn>2</mn></mrow><mn>2</mn></msubsup></mrow></semantics></math></inline-formula> were not very successful in detecting IOs. The <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msubsup><mi mathvariant="normal">H</mi><mrow><mi>si</mi><mn>1</mn></mrow><mn>2</mn></msubsup></mrow></semantics></math></inline-formula> suffered from the masking effect, and the robust <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msubsup><mi>H</mi><mrow><mi>s</mi><mi>i</mi><mn>2</mn></mrow><mn>2</mn></msubsup></mrow></semantics></math></inline-formula> suffered from swamping in general spatial models. Interestingly, the results showed that the proposed <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>E</mi><mi>S</mi><mi>R</mi><mi>s</mi><mo>−</mo><msub><mi>P</mi><mrow><mi>o</mi><mi>s</mi><mi>i</mi></mrow></msub></mrow></semantics></math></inline-formula> plot, followed by the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>I</mi><mi>S</mi><mi>R</mi><mi>s</mi><mo>−</mo><msub><mi>P</mi><mrow><mi>o</mi><mi>s</mi><mi>i</mi></mrow></msub></mrow></semantics></math></inline-formula> plot, was very successful in classifying observations into the correct groups, hence correctly detecting the real IOs.https://www.mdpi.com/2073-8994/13/11/2030spatial regression modelinfluential observationoutlierleverageprediction residualmasking and swamping
spellingShingle Ali Mohammed Baba
Habshah Midi
Mohd Bakri Adam
Nur Haizum Abd Rahman
Detection of Influential Observations in Spatial Regression Model Based on Outliers and Bad Leverage Classification
Symmetry
spatial regression model
influential observation
outlier
leverage
prediction residual
masking and swamping
title Detection of Influential Observations in Spatial Regression Model Based on Outliers and Bad Leverage Classification
title_full Detection of Influential Observations in Spatial Regression Model Based on Outliers and Bad Leverage Classification
title_fullStr Detection of Influential Observations in Spatial Regression Model Based on Outliers and Bad Leverage Classification
title_full_unstemmed Detection of Influential Observations in Spatial Regression Model Based on Outliers and Bad Leverage Classification
title_short Detection of Influential Observations in Spatial Regression Model Based on Outliers and Bad Leverage Classification
title_sort detection of influential observations in spatial regression model based on outliers and bad leverage classification
topic spatial regression model
influential observation
outlier
leverage
prediction residual
masking and swamping
url https://www.mdpi.com/2073-8994/13/11/2030
work_keys_str_mv AT alimohammedbaba detectionofinfluentialobservationsinspatialregressionmodelbasedonoutliersandbadleverageclassification
AT habshahmidi detectionofinfluentialobservationsinspatialregressionmodelbasedonoutliersandbadleverageclassification
AT mohdbakriadam detectionofinfluentialobservationsinspatialregressionmodelbasedonoutliersandbadleverageclassification
AT nurhaizumabdrahman detectionofinfluentialobservationsinspatialregressionmodelbasedonoutliersandbadleverageclassification