A new single linkage robust clustering outlier detection procedures for multivariate data

Outliers are abnormal data, and the detection of outliers in multivariate data has always been of interest. Unlike univariate data, outlier detection for multivariate data is insufficient with a visual inspection. In this study, we developed a new single linkage robust clustering outlier detection p...

Full description

Bibliographic Details
Main Authors: Sharifah Sakinah, Syed Abd Mutalib, Siti Zanariah, Satari, Wan Nur Syahidah, Wan Yusoff
Format: Article
Language:English
English
Published: Penerbit Universiti Kebangsaan Malaysia 2023
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/40889/1/A%20new%20single%20linkage%20robust%20clustering%20outlier%20detection.pdf
http://umpir.ump.edu.my/id/eprint/40889/2/A%20new%20single%20linkage%20robust%20clustering%20outlier%20detection%20procedures%20for%20multivariate%20data_ABS.pdf
_version_ 1825815578109018112
author Sharifah Sakinah, Syed Abd Mutalib
Siti Zanariah, Satari
Wan Nur Syahidah, Wan Yusoff
author_facet Sharifah Sakinah, Syed Abd Mutalib
Siti Zanariah, Satari
Wan Nur Syahidah, Wan Yusoff
author_sort Sharifah Sakinah, Syed Abd Mutalib
collection UMP
description Outliers are abnormal data, and the detection of outliers in multivariate data has always been of interest. Unlike univariate data, outlier detection for multivariate data is insufficient with a visual inspection. In this study, we developed a new single linkage robust clustering outlier detection procedure for multivariate data. A robust estimator, Test on Covariance (TOC) is used to robustified the similarity distance measure, producing robust single linkage clustering. The performance of the new single linkage robust clustering outlier detection procedure is investigated via a simulation study using three outlier scenarios and historical multivariate datasets as illustrative examples. Three performance measures are used, which are pout, pmask, and pswamp. The performance of the new single linkage robust clustering procedure also compared with single linkage clustering using Euclidean and Mahalanobis distances as similarity distance measures as well as TOC. It is found that the new single linkage robust clustering procedure performs well in Outlier Scenario 3 when the mean and covariance matrix are shifted. The new procedure also performs well by successfully detecting all outliers, does not have masking effects in two out of five datasets and does not have swamping effect in all datasets. In conclusion, the new single linkage robust clustering outlier detection procedure is a practical and promising approach and good for simultaneously identifying multiple outliers in multivariate data.
first_indexed 2024-09-25T03:48:32Z
format Article
id UMPir40889
institution Universiti Malaysia Pahang
language English
English
last_indexed 2024-09-25T03:48:32Z
publishDate 2023
publisher Penerbit Universiti Kebangsaan Malaysia
record_format dspace
spelling UMPir408892024-05-28T08:04:38Z http://umpir.ump.edu.my/id/eprint/40889/ A new single linkage robust clustering outlier detection procedures for multivariate data Sharifah Sakinah, Syed Abd Mutalib Siti Zanariah, Satari Wan Nur Syahidah, Wan Yusoff Q Science (General) QA Mathematics Outliers are abnormal data, and the detection of outliers in multivariate data has always been of interest. Unlike univariate data, outlier detection for multivariate data is insufficient with a visual inspection. In this study, we developed a new single linkage robust clustering outlier detection procedure for multivariate data. A robust estimator, Test on Covariance (TOC) is used to robustified the similarity distance measure, producing robust single linkage clustering. The performance of the new single linkage robust clustering outlier detection procedure is investigated via a simulation study using three outlier scenarios and historical multivariate datasets as illustrative examples. Three performance measures are used, which are pout, pmask, and pswamp. The performance of the new single linkage robust clustering procedure also compared with single linkage clustering using Euclidean and Mahalanobis distances as similarity distance measures as well as TOC. It is found that the new single linkage robust clustering procedure performs well in Outlier Scenario 3 when the mean and covariance matrix are shifted. The new procedure also performs well by successfully detecting all outliers, does not have masking effects in two out of five datasets and does not have swamping effect in all datasets. In conclusion, the new single linkage robust clustering outlier detection procedure is a practical and promising approach and good for simultaneously identifying multiple outliers in multivariate data. Penerbit Universiti Kebangsaan Malaysia 2023-08 Article PeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/40889/1/A%20new%20single%20linkage%20robust%20clustering%20outlier%20detection.pdf pdf en http://umpir.ump.edu.my/id/eprint/40889/2/A%20new%20single%20linkage%20robust%20clustering%20outlier%20detection%20procedures%20for%20multivariate%20data_ABS.pdf Sharifah Sakinah, Syed Abd Mutalib and Siti Zanariah, Satari and Wan Nur Syahidah, Wan Yusoff (2023) A new single linkage robust clustering outlier detection procedures for multivariate data. Sains Malaysiana, 52 (8). pp. 2431-2451. ISSN 0126-6039. (Published) https://doi.org/10.17576/jsm-2023-5208-19 https://doi.org/10.17576/jsm-2023-5208-19
spellingShingle Q Science (General)
QA Mathematics
Sharifah Sakinah, Syed Abd Mutalib
Siti Zanariah, Satari
Wan Nur Syahidah, Wan Yusoff
A new single linkage robust clustering outlier detection procedures for multivariate data
title A new single linkage robust clustering outlier detection procedures for multivariate data
title_full A new single linkage robust clustering outlier detection procedures for multivariate data
title_fullStr A new single linkage robust clustering outlier detection procedures for multivariate data
title_full_unstemmed A new single linkage robust clustering outlier detection procedures for multivariate data
title_short A new single linkage robust clustering outlier detection procedures for multivariate data
title_sort new single linkage robust clustering outlier detection procedures for multivariate data
topic Q Science (General)
QA Mathematics
url http://umpir.ump.edu.my/id/eprint/40889/1/A%20new%20single%20linkage%20robust%20clustering%20outlier%20detection.pdf
http://umpir.ump.edu.my/id/eprint/40889/2/A%20new%20single%20linkage%20robust%20clustering%20outlier%20detection%20procedures%20for%20multivariate%20data_ABS.pdf
work_keys_str_mv AT sharifahsakinahsyedabdmutalib anewsinglelinkagerobustclusteringoutlierdetectionproceduresformultivariatedata
AT sitizanariahsatari anewsinglelinkagerobustclusteringoutlierdetectionproceduresformultivariatedata
AT wannursyahidahwanyusoff anewsinglelinkagerobustclusteringoutlierdetectionproceduresformultivariatedata
AT sharifahsakinahsyedabdmutalib newsinglelinkagerobustclusteringoutlierdetectionproceduresformultivariatedata
AT sitizanariahsatari newsinglelinkagerobustclusteringoutlierdetectionproceduresformultivariatedata
AT wannursyahidahwanyusoff newsinglelinkagerobustclusteringoutlierdetectionproceduresformultivariatedata