A Feature Selection Method for Multi-Label Text Based on Feature Importance
Multi-label text classification refers to a text divided into multiple categories simultaneously, which corresponds to a text associated with multiple topics in the real world. The feature space generated by text data has the characteristics of high dimensionality and sparsity. Feature selection is...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2019-02-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/9/4/665 |
_version_ | 1818195933222928384 |
---|---|
author | Lu Zhang Qingling Duan |
author_facet | Lu Zhang Qingling Duan |
author_sort | Lu Zhang |
collection | DOAJ |
description | Multi-label text classification refers to a text divided into multiple categories simultaneously, which corresponds to a text associated with multiple topics in the real world. The feature space generated by text data has the characteristics of high dimensionality and sparsity. Feature selection is an efficient technology that removes useless and redundant features, reduces the dimension of the feature space, and avoids dimension disaster. A feature selection method for multi-label text based on feature importance is proposed in this paper. Firstly, multi-label texts are transformed into single-label texts using the label assignment method. Secondly, the importance of each feature is calculated using the method based on Category Contribution (CC). Finally, features with higher importance are selected to construct the feature space. In the proposed method, the feature importance is calculated from the perspective of the category, which ensures the selected features have strong category discrimination ability. Specifically, the contributions of the features to each category from two aspects of inter-category and intra-category are calculated, then the importance of the features is obtained with the combination of them. The proposed method is tested on six public data sets and the experimental results are good, which demonstrates the effectiveness of the proposed method. |
first_indexed | 2024-12-12T01:26:03Z |
format | Article |
id | doaj.art-80b6b6cc7cb84216ba0685f72591ede8 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-12-12T01:26:03Z |
publishDate | 2019-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-80b6b6cc7cb84216ba0685f72591ede82022-12-22T00:43:06ZengMDPI AGApplied Sciences2076-34172019-02-019466510.3390/app9040665app9040665A Feature Selection Method for Multi-Label Text Based on Feature ImportanceLu Zhang0Qingling Duan1College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, ChinaCollege of Information and Electrical Engineering, China Agricultural University, Beijing 100083, ChinaMulti-label text classification refers to a text divided into multiple categories simultaneously, which corresponds to a text associated with multiple topics in the real world. The feature space generated by text data has the characteristics of high dimensionality and sparsity. Feature selection is an efficient technology that removes useless and redundant features, reduces the dimension of the feature space, and avoids dimension disaster. A feature selection method for multi-label text based on feature importance is proposed in this paper. Firstly, multi-label texts are transformed into single-label texts using the label assignment method. Secondly, the importance of each feature is calculated using the method based on Category Contribution (CC). Finally, features with higher importance are selected to construct the feature space. In the proposed method, the feature importance is calculated from the perspective of the category, which ensures the selected features have strong category discrimination ability. Specifically, the contributions of the features to each category from two aspects of inter-category and intra-category are calculated, then the importance of the features is obtained with the combination of them. The proposed method is tested on six public data sets and the experimental results are good, which demonstrates the effectiveness of the proposed method.https://www.mdpi.com/2076-3417/9/4/665feature selectionmulti-label text classificationcategory contributionfeature importance |
spellingShingle | Lu Zhang Qingling Duan A Feature Selection Method for Multi-Label Text Based on Feature Importance Applied Sciences feature selection multi-label text classification category contribution feature importance |
title | A Feature Selection Method for Multi-Label Text Based on Feature Importance |
title_full | A Feature Selection Method for Multi-Label Text Based on Feature Importance |
title_fullStr | A Feature Selection Method for Multi-Label Text Based on Feature Importance |
title_full_unstemmed | A Feature Selection Method for Multi-Label Text Based on Feature Importance |
title_short | A Feature Selection Method for Multi-Label Text Based on Feature Importance |
title_sort | feature selection method for multi label text based on feature importance |
topic | feature selection multi-label text classification category contribution feature importance |
url | https://www.mdpi.com/2076-3417/9/4/665 |
work_keys_str_mv | AT luzhang afeatureselectionmethodformultilabeltextbasedonfeatureimportance AT qinglingduan afeatureselectionmethodformultilabeltextbasedonfeatureimportance AT luzhang featureselectionmethodformultilabeltextbasedonfeatureimportance AT qinglingduan featureselectionmethodformultilabeltextbasedonfeatureimportance |