Generalized Information-Theoretic Criterion for Multi-Label Feature Selection

Multi-label feature selection that identifies important features from the original feature set of multi-labeled datasets has been attracting considerable attention owing to its generality compared to conventional single-label feature selection. The unimportant features are filtered by scoring the de...

Full description

Bibliographic Details
Main Authors:	Wangduk Seo, Dae-Won Kim, Jaesung Lee
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	Machine learning multi-label learning multi-label feature selection information entropy
Online Access:	https://ieeexplore.ieee.org/document/8756255/

_version_	1818643240844263424
author	Wangduk Seo Dae-Won Kim Jaesung Lee
author_facet	Wangduk Seo Dae-Won Kim Jaesung Lee
author_sort	Wangduk Seo
collection	DOAJ
description	Multi-label feature selection that identifies important features from the original feature set of multi-labeled datasets has been attracting considerable attention owing to its generality compared to conventional single-label feature selection. The unimportant features are filtered by scoring the dependency of features to labels. In conventional multi-label feature filter studies, the score function is obtained by approximating a dependency measure such as joint entropy because direct calculation is often impractical due to the presence of multiple labels with limited training patterns. Although the efficacy of approximation can differ depending on the characteristics of the multi-label dataset, conventional methods presume a certain approximation method, leading to a degenerated feature subset if the presumed approximation is inappropriate for the given dataset. In this study, we propose a strategy for selecting an approximation among a series of approximations depending on the number of involved variables and consequently instantiate a score function based on the chosen approximation. The experimental results demonstrate that the proposed strategy and score function outperform conventional multi-label feature selection methods.
first_indexed	2024-12-16T23:55:49Z
format	Article
id	doaj.art-aca604391c494280b585067bb1bea632
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-16T23:55:49Z
publishDate	2019-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-aca604391c494280b585067bb1bea6322022-12-21T22:11:12ZengIEEEIEEE Access2169-35362019-01-01712285412286310.1109/ACCESS.2019.29274008756255Generalized Information-Theoretic Criterion for Multi-Label Feature SelectionWangduk Seo0https://orcid.org/0000-0003-4806-1614Dae-Won Kim1https://orcid.org/0000-0001-7124-1141Jaesung Lee2https://orcid.org/0000-0002-3757-3510School of Computer Science and Engineering, Chung-Ang University, Seoul, South KoreaSchool of Computer Science and Engineering, Chung-Ang University, Seoul, South KoreaSchool of Computer Science and Engineering, Chung-Ang University, Seoul, South KoreaMulti-label feature selection that identifies important features from the original feature set of multi-labeled datasets has been attracting considerable attention owing to its generality compared to conventional single-label feature selection. The unimportant features are filtered by scoring the dependency of features to labels. In conventional multi-label feature filter studies, the score function is obtained by approximating a dependency measure such as joint entropy because direct calculation is often impractical due to the presence of multiple labels with limited training patterns. Although the efficacy of approximation can differ depending on the characteristics of the multi-label dataset, conventional methods presume a certain approximation method, leading to a degenerated feature subset if the presumed approximation is inappropriate for the given dataset. In this study, we propose a strategy for selecting an approximation among a series of approximations depending on the number of involved variables and consequently instantiate a score function based on the chosen approximation. The experimental results demonstrate that the proposed strategy and score function outperform conventional multi-label feature selection methods.https://ieeexplore.ieee.org/document/8756255/Machine learningmulti-label learningmulti-label feature selectioninformation entropy
spellingShingle	Wangduk Seo Dae-Won Kim Jaesung Lee Generalized Information-Theoretic Criterion for Multi-Label Feature Selection IEEE Access Machine learning multi-label learning multi-label feature selection information entropy
title	Generalized Information-Theoretic Criterion for Multi-Label Feature Selection
title_full	Generalized Information-Theoretic Criterion for Multi-Label Feature Selection
title_fullStr	Generalized Information-Theoretic Criterion for Multi-Label Feature Selection
title_full_unstemmed	Generalized Information-Theoretic Criterion for Multi-Label Feature Selection
title_short	Generalized Information-Theoretic Criterion for Multi-Label Feature Selection
title_sort	generalized information theoretic criterion for multi label feature selection
topic	Machine learning multi-label learning multi-label feature selection information entropy
url	https://ieeexplore.ieee.org/document/8756255/
work_keys_str_mv	AT wangdukseo generalizedinformationtheoreticcriterionformultilabelfeatureselection AT daewonkim generalizedinformationtheoreticcriterionformultilabelfeatureselection AT jaesunglee generalizedinformationtheoreticcriterionformultilabelfeatureselection

Generalized Information-Theoretic Criterion for Multi-Label Feature Selection

Similar Items