Winsorised gini impurity: A resistant to outliers splitting metric for classification tree

Constructing a classification tree is sometimes complicated due to outliers occur in the data. Eliminating the outliers is the simplest option, but some important information will lose. Alternatively, one may make some amendments on the value of outliers, but the amended value is arguable in term of...

Full description

Bibliographic Details
Main Authors: Chee, Keong Ch'ng, Mahat, Nor Idayu
Format: Article
Published: IP Publishing LLC 2014
Subjects:
_version_ 1825805352704147456
author Chee, Keong Ch'ng
Mahat, Nor Idayu
author_facet Chee, Keong Ch'ng
Mahat, Nor Idayu
author_sort Chee, Keong Ch'ng
collection UUM
description Constructing a classification tree is sometimes complicated due to outliers occur in the data. Eliminating the outliers is the simplest option, but some important information will lose. Alternatively, one may make some amendments on the value of outliers, but the amended value is arguable in term of its suitability for classification purposes. We describe a strategy in order to identify and to handle the outliers in the process of constructing a classification tree. A Winsorised approach is suggested in estimating the impurity of the data prior to the splitting of each node of a tree. The proposed estimator provides a splitting value that resistant towards outliers in the data hence influences the performance based on plug in error rate of the tree. We examine the proposed idea on some real data sets represent various sizes of sample. The performance indicates that the proposed strategy is competitive, and sometimes shows better performance than traditional tree.
first_indexed 2024-07-04T06:30:55Z
format Article
id uum-25772
institution Universiti Utara Malaysia
last_indexed 2024-07-04T06:30:55Z
publishDate 2014
publisher IP Publishing LLC
record_format eprints
spelling uum-257722019-03-17T03:01:28Z https://repo.uum.edu.my/id/eprint/25772/ Winsorised gini impurity: A resistant to outliers splitting metric for classification tree Chee, Keong Ch'ng Mahat, Nor Idayu QA75 Electronic computers. Computer science Constructing a classification tree is sometimes complicated due to outliers occur in the data. Eliminating the outliers is the simplest option, but some important information will lose. Alternatively, one may make some amendments on the value of outliers, but the amended value is arguable in term of its suitability for classification purposes. We describe a strategy in order to identify and to handle the outliers in the process of constructing a classification tree. A Winsorised approach is suggested in estimating the impurity of the data prior to the splitting of each node of a tree. The proposed estimator provides a splitting value that resistant towards outliers in the data hence influences the performance based on plug in error rate of the tree. We examine the proposed idea on some real data sets represent various sizes of sample. The performance indicates that the proposed strategy is competitive, and sometimes shows better performance than traditional tree. IP Publishing LLC 2014 Article PeerReviewed Chee, Keong Ch'ng and Mahat, Nor Idayu (2014) Winsorised gini impurity: A resistant to outliers splitting metric for classification tree. AIP Conference Proceedings, 1635. pp. 716-723. ISSN 0094-243X http://doi.org/10.1063/1.4903661 doi:10.1063/1.4903661 doi:10.1063/1.4903661
spellingShingle QA75 Electronic computers. Computer science
Chee, Keong Ch'ng
Mahat, Nor Idayu
Winsorised gini impurity: A resistant to outliers splitting metric for classification tree
title Winsorised gini impurity: A resistant to outliers splitting metric for classification tree
title_full Winsorised gini impurity: A resistant to outliers splitting metric for classification tree
title_fullStr Winsorised gini impurity: A resistant to outliers splitting metric for classification tree
title_full_unstemmed Winsorised gini impurity: A resistant to outliers splitting metric for classification tree
title_short Winsorised gini impurity: A resistant to outliers splitting metric for classification tree
title_sort winsorised gini impurity a resistant to outliers splitting metric for classification tree
topic QA75 Electronic computers. Computer science
work_keys_str_mv AT cheekeongchng winsorisedginiimpurityaresistanttooutlierssplittingmetricforclassificationtree
AT mahatnoridayu winsorisedginiimpurityaresistanttooutlierssplittingmetricforclassificationtree