The performance of clustering approach with robust mm-estimator for multiple outlier detection in linear regression
Identifying outlier is a fundamental step in the regression model building process. Outlying observations should be identified because of their potential effect on the fitted model. As a result of the need to identify outliers, numerous outlying measures such as residuals and hat matrix diagonal are...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Penerbit UTM Press
2006
|
Subjects: | |
Online Access: | http://eprints.utm.my/7941/1/JTDIS45C%5BB%5DNHuda_Firdaus.pdf |
_version_ | 1796854307368206336 |
---|---|
author | Mohd. Azmi, Nurulhuda Firdaus Midi, Habshah Ismail, Noranita Fairus |
author_facet | Mohd. Azmi, Nurulhuda Firdaus Midi, Habshah Ismail, Noranita Fairus |
author_sort | Mohd. Azmi, Nurulhuda Firdaus |
collection | ePrints |
description | Identifying outlier is a fundamental step in the regression model building process. Outlying observations should be identified because of their potential effect on the fitted model. As a result of the need to identify outliers, numerous outlying measures such as residuals and hat matrix diagonal are built. However, these outlying measures works well when a regression data set contains only a single outlying point and it is well established that regression real data sets may have multiple outlying observations that individually are not easy to identify by the same measures. In this paper, an alternative approach is proposed, that is clustering technique incorporated with robust estimator for multiple outlier identification. The robust estimator proposes is MM-Estimator. The performance of clustering approach with proposed estimator is compared with other estimator that is the classical estimator namely Least Square (LS) and other robust estimator that is Least Trimmed Square (LTS). The evaluation of the estimator performance is carried out through analyses on a classical multiple outlier data sets found in the literature and simulated multiple outlier data sets. Additionally, the analysis of Root Mean Square Error (RMSE) value and coverage probabilities of Bootstrap Bias Corrected and Accelerated (BCa) confidence interval are also being conducted to identify the best
estimator in identification of multiple outliers. From the analysis, it has been revealed that the MMEstimator
performed excellently on the classical multiple outlier data sets and a wide variety of simulated data sets with any percentage of outliers, any number of regressor variables and any sample sizes followed by LTS and LS. The analysis also showed that the value of RMSE of the proposed estimator is always smaller than the other two estimators. Whereupon, the coverage probabilities of BCa confidence interval also conclude that the MM-Estimator confidence interval have all the criteria’s to be the best estimator since it has a good coverage probabilities, good equatailness and the shortest average confident length followed by LTS and LS.
|
first_indexed | 2024-03-05T18:12:21Z |
format | Article |
id | utm.eprints-7941 |
institution | Universiti Teknologi Malaysia - ePrints |
language | English |
last_indexed | 2024-03-05T18:12:21Z |
publishDate | 2006 |
publisher | Penerbit UTM Press |
record_format | dspace |
spelling | utm.eprints-79412010-11-09T09:51:47Z http://eprints.utm.my/7941/ The performance of clustering approach with robust mm-estimator for multiple outlier detection in linear regression Mohd. Azmi, Nurulhuda Firdaus Midi, Habshah Ismail, Noranita Fairus QA75 Electronic computers. Computer science Identifying outlier is a fundamental step in the regression model building process. Outlying observations should be identified because of their potential effect on the fitted model. As a result of the need to identify outliers, numerous outlying measures such as residuals and hat matrix diagonal are built. However, these outlying measures works well when a regression data set contains only a single outlying point and it is well established that regression real data sets may have multiple outlying observations that individually are not easy to identify by the same measures. In this paper, an alternative approach is proposed, that is clustering technique incorporated with robust estimator for multiple outlier identification. The robust estimator proposes is MM-Estimator. The performance of clustering approach with proposed estimator is compared with other estimator that is the classical estimator namely Least Square (LS) and other robust estimator that is Least Trimmed Square (LTS). The evaluation of the estimator performance is carried out through analyses on a classical multiple outlier data sets found in the literature and simulated multiple outlier data sets. Additionally, the analysis of Root Mean Square Error (RMSE) value and coverage probabilities of Bootstrap Bias Corrected and Accelerated (BCa) confidence interval are also being conducted to identify the best estimator in identification of multiple outliers. From the analysis, it has been revealed that the MMEstimator performed excellently on the classical multiple outlier data sets and a wide variety of simulated data sets with any percentage of outliers, any number of regressor variables and any sample sizes followed by LTS and LS. The analysis also showed that the value of RMSE of the proposed estimator is always smaller than the other two estimators. Whereupon, the coverage probabilities of BCa confidence interval also conclude that the MM-Estimator confidence interval have all the criteria’s to be the best estimator since it has a good coverage probabilities, good equatailness and the shortest average confident length followed by LTS and LS. Penerbit UTM Press 2006-12 Article PeerReviewed application/pdf en http://eprints.utm.my/7941/1/JTDIS45C%5BB%5DNHuda_Firdaus.pdf Mohd. Azmi, Nurulhuda Firdaus and Midi, Habshah and Ismail, Noranita Fairus (2006) The performance of clustering approach with robust mm-estimator for multiple outlier detection in linear regression. Jurnal Teknologi C (45C). pp. 15-28. ISSN 0126-9797 http://www.penerbit.utm.my/onlinejournal/45/C/JTDIS45C2.pdf |
spellingShingle | QA75 Electronic computers. Computer science Mohd. Azmi, Nurulhuda Firdaus Midi, Habshah Ismail, Noranita Fairus The performance of clustering approach with robust mm-estimator for multiple outlier detection in linear regression |
title | The performance of clustering approach with robust mm-estimator for multiple outlier detection in linear regression |
title_full | The performance of clustering approach with robust mm-estimator for multiple outlier detection in linear regression |
title_fullStr | The performance of clustering approach with robust mm-estimator for multiple outlier detection in linear regression |
title_full_unstemmed | The performance of clustering approach with robust mm-estimator for multiple outlier detection in linear regression |
title_short | The performance of clustering approach with robust mm-estimator for multiple outlier detection in linear regression |
title_sort | performance of clustering approach with robust mm estimator for multiple outlier detection in linear regression |
topic | QA75 Electronic computers. Computer science |
url | http://eprints.utm.my/7941/1/JTDIS45C%5BB%5DNHuda_Firdaus.pdf |
work_keys_str_mv | AT mohdazminurulhudafirdaus theperformanceofclusteringapproachwithrobustmmestimatorformultipleoutlierdetectioninlinearregression AT midihabshah theperformanceofclusteringapproachwithrobustmmestimatorformultipleoutlierdetectioninlinearregression AT ismailnoranitafairus theperformanceofclusteringapproachwithrobustmmestimatorformultipleoutlierdetectioninlinearregression AT mohdazminurulhudafirdaus performanceofclusteringapproachwithrobustmmestimatorformultipleoutlierdetectioninlinearregression AT midihabshah performanceofclusteringapproachwithrobustmmestimatorformultipleoutlierdetectioninlinearregression AT ismailnoranitafairus performanceofclusteringapproachwithrobustmmestimatorformultipleoutlierdetectioninlinearregression |