Detection of multiple outliners in linear regression using nonparametric methods

There have been considerable interest in recent years in the detection and accommodation of multiple outliers in linear regression. However, most of them are complicated and unappealing to users with no mathematical background. The clustering algorithm from Sebert et al. (1998) is discussed and used...

Full description

Bibliographic Details
Main Author: Adnan, Robiah
Format: Monograph
Language:English
Published: Universiti Teknologi Malaysia 2004
Subjects:
Online Access:http://eprints.utm.my/2997/1/75021.pdf
_version_ 1825909326666006528
author Adnan, Robiah
author_facet Adnan, Robiah
author_sort Adnan, Robiah
collection ePrints
description There have been considerable interest in recent years in the detection and accommodation of multiple outliers in linear regression. However, most of them are complicated and unappealing to users with no mathematical background. The clustering algorithm from Sebert et al. (1998) is discussed and used since it is easy to understand with interesting proposed approach and have a good performance in detecting the presence of outliers. Generally, method proposed by Sebert et al. (1998) is based on the use of single linkage clustering algorithm with the Euclidean distances to cluster the points in the plots of standard predicted versus residuals values from a linear regression model. The predicted and residual values are obtained from an ordinary least squares fit of the data. The algorithm is described and is shown to perform well on classic multiple outlier data sets. A modification is done to the Sebert’s method by replacing the least squares (LS) with two robust estimators. Method 1 is a modification of Sebert’s method where the list squares (LS) fit is replaced by the least median of squares (LMS) fit while Method 2 is a modification of Sebert’s method where the least squares (LS) fit is replaced by the least trimmed of squares (LTS) fit. This reseach also provides a comparison between these three procedure to detect multiple outliers. A Monte Carlo simulations study was used to evaluate the effectiveness of these three procedures. All simulations and calculation were done using statistical package S-PLUS 2000. REFERENCES Agullo, J. (2000). New Algorithms for Computing the Least Trimmed Squares Regression Estimator. Computational Statistics and Data Analysis 36. 425-439. Aldenderfer M.S and Blashfield R.K.(1984). Cluster Analysis. USA: Sage Publications. Atkinson, A.C. (1986). Comment on ‘Influential Observations, High Leverage Points, and Outliers in Linear regression’. Statistical Science I. 397-402.
first_indexed 2024-03-05T18:00:28Z
format Monograph
id utm.eprints-2997
institution Universiti Teknologi Malaysia - ePrints
language English
last_indexed 2024-03-05T18:00:28Z
publishDate 2004
publisher Universiti Teknologi Malaysia
record_format dspace
spelling utm.eprints-29972017-08-01T01:06:37Z http://eprints.utm.my/2997/ Detection of multiple outliners in linear regression using nonparametric methods Adnan, Robiah QA Mathematics There have been considerable interest in recent years in the detection and accommodation of multiple outliers in linear regression. However, most of them are complicated and unappealing to users with no mathematical background. The clustering algorithm from Sebert et al. (1998) is discussed and used since it is easy to understand with interesting proposed approach and have a good performance in detecting the presence of outliers. Generally, method proposed by Sebert et al. (1998) is based on the use of single linkage clustering algorithm with the Euclidean distances to cluster the points in the plots of standard predicted versus residuals values from a linear regression model. The predicted and residual values are obtained from an ordinary least squares fit of the data. The algorithm is described and is shown to perform well on classic multiple outlier data sets. A modification is done to the Sebert’s method by replacing the least squares (LS) with two robust estimators. Method 1 is a modification of Sebert’s method where the list squares (LS) fit is replaced by the least median of squares (LMS) fit while Method 2 is a modification of Sebert’s method where the least squares (LS) fit is replaced by the least trimmed of squares (LTS) fit. This reseach also provides a comparison between these three procedure to detect multiple outliers. A Monte Carlo simulations study was used to evaluate the effectiveness of these three procedures. All simulations and calculation were done using statistical package S-PLUS 2000. REFERENCES Agullo, J. (2000). New Algorithms for Computing the Least Trimmed Squares Regression Estimator. Computational Statistics and Data Analysis 36. 425-439. Aldenderfer M.S and Blashfield R.K.(1984). Cluster Analysis. USA: Sage Publications. Atkinson, A.C. (1986). Comment on ‘Influential Observations, High Leverage Points, and Outliers in Linear regression’. Statistical Science I. 397-402. Universiti Teknologi Malaysia 2004-09-30 Monograph NonPeerReviewed application/pdf en http://eprints.utm.my/2997/1/75021.pdf Adnan, Robiah (2004) Detection of multiple outliners in linear regression using nonparametric methods. Project Report. Universiti Teknologi Malaysia. (Unpublished)
spellingShingle QA Mathematics
Adnan, Robiah
Detection of multiple outliners in linear regression using nonparametric methods
title Detection of multiple outliners in linear regression using nonparametric methods
title_full Detection of multiple outliners in linear regression using nonparametric methods
title_fullStr Detection of multiple outliners in linear regression using nonparametric methods
title_full_unstemmed Detection of multiple outliners in linear regression using nonparametric methods
title_short Detection of multiple outliners in linear regression using nonparametric methods
title_sort detection of multiple outliners in linear regression using nonparametric methods
topic QA Mathematics
url http://eprints.utm.my/2997/1/75021.pdf
work_keys_str_mv AT adnanrobiah detectionofmultipleoutlinersinlinearregressionusingnonparametricmethods