The performance of clustering approach with robust mm-estimator for multiple outlier detection in linear regression

Identifying outlier is a fundamental step in the regression model building process. Outlying observations should be identified because of their potential effect on the fitted model. As a result of the need to identify outliers, numerous outlying measures such as residuals and hat matrix diagonal are...

Full description

Bibliographic Details
Main Authors: Mohd. Azmi, Nurulhuda Firdaus, Midi, Habshah, Ismail, Noranita Fairus
Format: Article
Language:English
Published: Penerbit UTM Press 2006
Subjects:
Online Access:http://eprints.utm.my/7941/1/JTDIS45C%5BB%5DNHuda_Firdaus.pdf
_version_ 1796854307368206336
author Mohd. Azmi, Nurulhuda Firdaus
Midi, Habshah
Ismail, Noranita Fairus
author_facet Mohd. Azmi, Nurulhuda Firdaus
Midi, Habshah
Ismail, Noranita Fairus
author_sort Mohd. Azmi, Nurulhuda Firdaus
collection ePrints
description Identifying outlier is a fundamental step in the regression model building process. Outlying observations should be identified because of their potential effect on the fitted model. As a result of the need to identify outliers, numerous outlying measures such as residuals and hat matrix diagonal are built. However, these outlying measures works well when a regression data set contains only a single outlying point and it is well established that regression real data sets may have multiple outlying observations that individually are not easy to identify by the same measures. In this paper, an alternative approach is proposed, that is clustering technique incorporated with robust estimator for multiple outlier identification. The robust estimator proposes is MM-Estimator. The performance of clustering approach with proposed estimator is compared with other estimator that is the classical estimator namely Least Square (LS) and other robust estimator that is Least Trimmed Square (LTS). The evaluation of the estimator performance is carried out through analyses on a classical multiple outlier data sets found in the literature and simulated multiple outlier data sets. Additionally, the analysis of Root Mean Square Error (RMSE) value and coverage probabilities of Bootstrap Bias Corrected and Accelerated (BCa) confidence interval are also being conducted to identify the best estimator in identification of multiple outliers. From the analysis, it has been revealed that the MMEstimator performed excellently on the classical multiple outlier data sets and a wide variety of simulated data sets with any percentage of outliers, any number of regressor variables and any sample sizes followed by LTS and LS. The analysis also showed that the value of RMSE of the proposed estimator is always smaller than the other two estimators. Whereupon, the coverage probabilities of BCa confidence interval also conclude that the MM-Estimator confidence interval have all the criteria’s to be the best estimator since it has a good coverage probabilities, good equatailness and the shortest average confident length followed by LTS and LS.
first_indexed 2024-03-05T18:12:21Z
format Article
id utm.eprints-7941
institution Universiti Teknologi Malaysia - ePrints
language English
last_indexed 2024-03-05T18:12:21Z
publishDate 2006
publisher Penerbit UTM Press
record_format dspace
spelling utm.eprints-79412010-11-09T09:51:47Z http://eprints.utm.my/7941/ The performance of clustering approach with robust mm-estimator for multiple outlier detection in linear regression Mohd. Azmi, Nurulhuda Firdaus Midi, Habshah Ismail, Noranita Fairus QA75 Electronic computers. Computer science Identifying outlier is a fundamental step in the regression model building process. Outlying observations should be identified because of their potential effect on the fitted model. As a result of the need to identify outliers, numerous outlying measures such as residuals and hat matrix diagonal are built. However, these outlying measures works well when a regression data set contains only a single outlying point and it is well established that regression real data sets may have multiple outlying observations that individually are not easy to identify by the same measures. In this paper, an alternative approach is proposed, that is clustering technique incorporated with robust estimator for multiple outlier identification. The robust estimator proposes is MM-Estimator. The performance of clustering approach with proposed estimator is compared with other estimator that is the classical estimator namely Least Square (LS) and other robust estimator that is Least Trimmed Square (LTS). The evaluation of the estimator performance is carried out through analyses on a classical multiple outlier data sets found in the literature and simulated multiple outlier data sets. Additionally, the analysis of Root Mean Square Error (RMSE) value and coverage probabilities of Bootstrap Bias Corrected and Accelerated (BCa) confidence interval are also being conducted to identify the best estimator in identification of multiple outliers. From the analysis, it has been revealed that the MMEstimator performed excellently on the classical multiple outlier data sets and a wide variety of simulated data sets with any percentage of outliers, any number of regressor variables and any sample sizes followed by LTS and LS. The analysis also showed that the value of RMSE of the proposed estimator is always smaller than the other two estimators. Whereupon, the coverage probabilities of BCa confidence interval also conclude that the MM-Estimator confidence interval have all the criteria’s to be the best estimator since it has a good coverage probabilities, good equatailness and the shortest average confident length followed by LTS and LS. Penerbit UTM Press 2006-12 Article PeerReviewed application/pdf en http://eprints.utm.my/7941/1/JTDIS45C%5BB%5DNHuda_Firdaus.pdf Mohd. Azmi, Nurulhuda Firdaus and Midi, Habshah and Ismail, Noranita Fairus (2006) The performance of clustering approach with robust mm-estimator for multiple outlier detection in linear regression. Jurnal Teknologi C (45C). pp. 15-28. ISSN 0126-9797 http://www.penerbit.utm.my/onlinejournal/45/C/JTDIS45C2.pdf
spellingShingle QA75 Electronic computers. Computer science
Mohd. Azmi, Nurulhuda Firdaus
Midi, Habshah
Ismail, Noranita Fairus
The performance of clustering approach with robust mm-estimator for multiple outlier detection in linear regression
title The performance of clustering approach with robust mm-estimator for multiple outlier detection in linear regression
title_full The performance of clustering approach with robust mm-estimator for multiple outlier detection in linear regression
title_fullStr The performance of clustering approach with robust mm-estimator for multiple outlier detection in linear regression
title_full_unstemmed The performance of clustering approach with robust mm-estimator for multiple outlier detection in linear regression
title_short The performance of clustering approach with robust mm-estimator for multiple outlier detection in linear regression
title_sort performance of clustering approach with robust mm estimator for multiple outlier detection in linear regression
topic QA75 Electronic computers. Computer science
url http://eprints.utm.my/7941/1/JTDIS45C%5BB%5DNHuda_Firdaus.pdf
work_keys_str_mv AT mohdazminurulhudafirdaus theperformanceofclusteringapproachwithrobustmmestimatorformultipleoutlierdetectioninlinearregression
AT midihabshah theperformanceofclusteringapproachwithrobustmmestimatorformultipleoutlierdetectioninlinearregression
AT ismailnoranitafairus theperformanceofclusteringapproachwithrobustmmestimatorformultipleoutlierdetectioninlinearregression
AT mohdazminurulhudafirdaus performanceofclusteringapproachwithrobustmmestimatorformultipleoutlierdetectioninlinearregression
AT midihabshah performanceofclusteringapproachwithrobustmmestimatorformultipleoutlierdetectioninlinearregression
AT ismailnoranitafairus performanceofclusteringapproachwithrobustmmestimatorformultipleoutlierdetectioninlinearregression