Identification of High Leverage Points in Multiple Linear Regression / Noor Azima Ismail... [et al.]

Outliers with respect to the predictor variables are called high leverage points. The observations that are slightly different from all others can drive to a large difference in the results of regression analysis. In regression analysis, the detection of high leverage points is compulsory, as they w...

Full description

Bibliographic Details
Main Authors: Ismail, Nor Azima, Midi, Prof Dr. Habshah, Mohamad Sobri, Norafefah Mohamad Sobri, Zulkifli, Siti Nurani Zulkifli
Format: Article
Language:English
Published: Unit Penerbitan UiTM Kelantan 2016
Subjects:
Online Access:https://ir.uitm.edu.my/id/eprint/24068/1/4
Description
Summary:Outliers with respect to the predictor variables are called high leverage points. The observations that are slightly different from all others can drive to a large difference in the results of regression analysis. In regression analysis, the detection of high leverage points is compulsory, as they will give large impact on the estimation values as well as lead to multicollinearity problems. In this situation, robust regression procedure can be very useful to deal with problems arise due to the existence of high leverage points. The aim of this study is to compare the performance of three methods in detecting high leverage points. At first stage, the two well-known data sets are considered. The first data used is artificial data set generated by Hawkins, Bradu and Kass in 1984 and the second data used is stack loss data by Brownlee in 1965. The second stage of this study is to conduct simulation study whereby the data were generated based on clean and contaminated data. The three sets of measures being considered in this study are Leverage methods Ttwice-the-mean-rule), Generalized Potentials and Diagnostic Robust Generalized Approach (DRGP). The result indicates that DRGP successfully proved its ability as a powerful method of detecting high leverage points as compared to the other two methods using both artificial data sets and simulated data.