Tree-Based Contrast Subspace Mining for Categorical Data

Mining contrast subspace has emerged to find subspaces where a particular queried object is most similar to the target class against the non-target class in a two-class data set. It is important to discover those subspaces, which are known as contrast subspaces, in many real-life applications. Tree-...

Full description

Bibliographic Details
Main Authors: Florence Sia, Rayner Alfred, Yuto Lim
Format: Article
Language:English
Published: Springer 2020-10-01
Series:International Journal of Computational Intelligence Systems
Subjects:
Online Access:https://www.atlantis-press.com/article/125945493/view
_version_ 1818534438751961088
author Florence Sia
Rayner Alfred
Yuto Lim
author_facet Florence Sia
Rayner Alfred
Yuto Lim
author_sort Florence Sia
collection DOAJ
description Mining contrast subspace has emerged to find subspaces where a particular queried object is most similar to the target class against the non-target class in a two-class data set. It is important to discover those subspaces, which are known as contrast subspaces, in many real-life applications. Tree-Based Contrast Subspace Miner (TB-CSMiner) method has been recently introduced to mine contrast subspaces of queried objects specifically for numerical data set. This method employs tree-based scoring function to estimate the likelihood contrast score of subspaces with respect to the given queried object. However, it limits the use of TB-CSMiner on categorical values that are frequently encountered in real-world data sets. In this paper, the TB-CSMiner method is extended by formulating the tree-based likelihood contrast scoring function for mining contrast subspace in categorical data set. The extended method uses features values of queried object to gather target samples having similar characteristics into the same group and separate non-target samples having different characteristics from this queried object in different group. Given a contrast subspace of the target samples, the queried object should fall in a group having target samples more than the non-target samples. Several experiments have been conducted on eight real world categorical data sets to evaluate the effectiveness of the proposed extended TB-CSMiner method by performing classification tasks in a two-class classification problem with categorical input variables. The obtained results demonstrated that the extended method can improve the performance accuracy of most classification tasks. Thus, the proposed extended tree-based method is also shown to have the ability to discover contrast subspaces of the given queried object in categorical data.
first_indexed 2024-12-11T18:11:38Z
format Article
id doaj.art-5518697296ff49259ed19c02102bad73
institution Directory Open Access Journal
issn 1875-6883
language English
last_indexed 2024-12-11T18:11:38Z
publishDate 2020-10-01
publisher Springer
record_format Article
series International Journal of Computational Intelligence Systems
spelling doaj.art-5518697296ff49259ed19c02102bad732022-12-22T00:55:33ZengSpringerInternational Journal of Computational Intelligence Systems1875-68832020-10-0113110.2991/ijcis.d.201020.001Tree-Based Contrast Subspace Mining for Categorical DataFlorence SiaRayner AlfredYuto LimMining contrast subspace has emerged to find subspaces where a particular queried object is most similar to the target class against the non-target class in a two-class data set. It is important to discover those subspaces, which are known as contrast subspaces, in many real-life applications. Tree-Based Contrast Subspace Miner (TB-CSMiner) method has been recently introduced to mine contrast subspaces of queried objects specifically for numerical data set. This method employs tree-based scoring function to estimate the likelihood contrast score of subspaces with respect to the given queried object. However, it limits the use of TB-CSMiner on categorical values that are frequently encountered in real-world data sets. In this paper, the TB-CSMiner method is extended by formulating the tree-based likelihood contrast scoring function for mining contrast subspace in categorical data set. The extended method uses features values of queried object to gather target samples having similar characteristics into the same group and separate non-target samples having different characteristics from this queried object in different group. Given a contrast subspace of the target samples, the queried object should fall in a group having target samples more than the non-target samples. Several experiments have been conducted on eight real world categorical data sets to evaluate the effectiveness of the proposed extended TB-CSMiner method by performing classification tasks in a two-class classification problem with categorical input variables. The obtained results demonstrated that the extended method can improve the performance accuracy of most classification tasks. Thus, the proposed extended tree-based method is also shown to have the ability to discover contrast subspaces of the given queried object in categorical data.https://www.atlantis-press.com/article/125945493/viewMining contrast subspaceContrast subspaceCategorical dataFeature selectionData mining
spellingShingle Florence Sia
Rayner Alfred
Yuto Lim
Tree-Based Contrast Subspace Mining for Categorical Data
International Journal of Computational Intelligence Systems
Mining contrast subspace
Contrast subspace
Categorical data
Feature selection
Data mining
title Tree-Based Contrast Subspace Mining for Categorical Data
title_full Tree-Based Contrast Subspace Mining for Categorical Data
title_fullStr Tree-Based Contrast Subspace Mining for Categorical Data
title_full_unstemmed Tree-Based Contrast Subspace Mining for Categorical Data
title_short Tree-Based Contrast Subspace Mining for Categorical Data
title_sort tree based contrast subspace mining for categorical data
topic Mining contrast subspace
Contrast subspace
Categorical data
Feature selection
Data mining
url https://www.atlantis-press.com/article/125945493/view
work_keys_str_mv AT florencesia treebasedcontrastsubspaceminingforcategoricaldata
AT rayneralfred treebasedcontrastsubspaceminingforcategoricaldata
AT yutolim treebasedcontrastsubspaceminingforcategoricaldata