A Feature Selection Method for Multi-Label Text Based on Feature Importance

Multi-label text classification refers to a text divided into multiple categories simultaneously, which corresponds to a text associated with multiple topics in the real world. The feature space generated by text data has the characteristics of high dimensionality and sparsity. Feature selection is...

Full description

Bibliographic Details
Main Authors: Lu Zhang, Qingling Duan
Format: Article
Language:English
Published: MDPI AG 2019-02-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/9/4/665
_version_ 1818195933222928384
author Lu Zhang
Qingling Duan
author_facet Lu Zhang
Qingling Duan
author_sort Lu Zhang
collection DOAJ
description Multi-label text classification refers to a text divided into multiple categories simultaneously, which corresponds to a text associated with multiple topics in the real world. The feature space generated by text data has the characteristics of high dimensionality and sparsity. Feature selection is an efficient technology that removes useless and redundant features, reduces the dimension of the feature space, and avoids dimension disaster. A feature selection method for multi-label text based on feature importance is proposed in this paper. Firstly, multi-label texts are transformed into single-label texts using the label assignment method. Secondly, the importance of each feature is calculated using the method based on Category Contribution (CC). Finally, features with higher importance are selected to construct the feature space. In the proposed method, the feature importance is calculated from the perspective of the category, which ensures the selected features have strong category discrimination ability. Specifically, the contributions of the features to each category from two aspects of inter-category and intra-category are calculated, then the importance of the features is obtained with the combination of them. The proposed method is tested on six public data sets and the experimental results are good, which demonstrates the effectiveness of the proposed method.
first_indexed 2024-12-12T01:26:03Z
format Article
id doaj.art-80b6b6cc7cb84216ba0685f72591ede8
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-12-12T01:26:03Z
publishDate 2019-02-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-80b6b6cc7cb84216ba0685f72591ede82022-12-22T00:43:06ZengMDPI AGApplied Sciences2076-34172019-02-019466510.3390/app9040665app9040665A Feature Selection Method for Multi-Label Text Based on Feature ImportanceLu Zhang0Qingling Duan1College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, ChinaCollege of Information and Electrical Engineering, China Agricultural University, Beijing 100083, ChinaMulti-label text classification refers to a text divided into multiple categories simultaneously, which corresponds to a text associated with multiple topics in the real world. The feature space generated by text data has the characteristics of high dimensionality and sparsity. Feature selection is an efficient technology that removes useless and redundant features, reduces the dimension of the feature space, and avoids dimension disaster. A feature selection method for multi-label text based on feature importance is proposed in this paper. Firstly, multi-label texts are transformed into single-label texts using the label assignment method. Secondly, the importance of each feature is calculated using the method based on Category Contribution (CC). Finally, features with higher importance are selected to construct the feature space. In the proposed method, the feature importance is calculated from the perspective of the category, which ensures the selected features have strong category discrimination ability. Specifically, the contributions of the features to each category from two aspects of inter-category and intra-category are calculated, then the importance of the features is obtained with the combination of them. The proposed method is tested on six public data sets and the experimental results are good, which demonstrates the effectiveness of the proposed method.https://www.mdpi.com/2076-3417/9/4/665feature selectionmulti-label text classificationcategory contributionfeature importance
spellingShingle Lu Zhang
Qingling Duan
A Feature Selection Method for Multi-Label Text Based on Feature Importance
Applied Sciences
feature selection
multi-label text classification
category contribution
feature importance
title A Feature Selection Method for Multi-Label Text Based on Feature Importance
title_full A Feature Selection Method for Multi-Label Text Based on Feature Importance
title_fullStr A Feature Selection Method for Multi-Label Text Based on Feature Importance
title_full_unstemmed A Feature Selection Method for Multi-Label Text Based on Feature Importance
title_short A Feature Selection Method for Multi-Label Text Based on Feature Importance
title_sort feature selection method for multi label text based on feature importance
topic feature selection
multi-label text classification
category contribution
feature importance
url https://www.mdpi.com/2076-3417/9/4/665
work_keys_str_mv AT luzhang afeatureselectionmethodformultilabeltextbasedonfeatureimportance
AT qinglingduan afeatureselectionmethodformultilabeltextbasedonfeatureimportance
AT luzhang featureselectionmethodformultilabeltextbasedonfeatureimportance
AT qinglingduan featureselectionmethodformultilabeltextbasedonfeatureimportance