Research on the internal influence factors of the text multi-classification problem
This paper mainly deals with the classification of text type data. The statistics show that more than 8000 articles have been reached in all kinds of documents retrieved by the optical network. However, there are few papers on the factors that affect the classification of text. The text classificati...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
EDP Sciences
2018-01-01
|
Series: | MATEC Web of Conferences |
Online Access: | https://doi.org/10.1051/matecconf/201817303072 |
_version_ | 1819180010021322752 |
---|---|
author | Mingqiang Wu Chang Furong Zhang Kui |
author_facet | Mingqiang Wu Chang Furong Zhang Kui |
author_sort | Mingqiang Wu |
collection | DOAJ |
description | This paper mainly deals with the classification of text type data. The statistics show that more than 8000 articles have been reached in all kinds of documents retrieved by the optical network. However, there are few papers on the factors that affect the classification of text. The text classification method used is important, but the internal factors sometimes play a great role, and even affect the success or failure of the whole text classification. In order to make up for this deficiency, this paper selects the Rocchio algorithm as the classification method, mainly from the category clustering density, class complexity, category definition, stop words and document’s length five internal factors, we tested their influences on text classification by the experiment. Experiment shows that the clustering density is higher and the complexity of the lower class, class definition is higher, the higher the accuracy of text classification, text classification effect is better, and better effect to text stop words, the length of the text does not directly affect the effect of text classification, but according to the text classification algorithm is more suitable to choose the length of the document. |
first_indexed | 2024-12-22T22:07:32Z |
format | Article |
id | doaj.art-164463f2cb6440fe86562af8572308cc |
institution | Directory Open Access Journal |
issn | 2261-236X |
language | English |
last_indexed | 2024-12-22T22:07:32Z |
publishDate | 2018-01-01 |
publisher | EDP Sciences |
record_format | Article |
series | MATEC Web of Conferences |
spelling | doaj.art-164463f2cb6440fe86562af8572308cc2022-12-21T18:10:57ZengEDP SciencesMATEC Web of Conferences2261-236X2018-01-011730307210.1051/matecconf/201817303072matecconf_smima2018_03072Research on the internal influence factors of the text multi-classification problemMingqiang WuChang FurongZhang KuiThis paper mainly deals with the classification of text type data. The statistics show that more than 8000 articles have been reached in all kinds of documents retrieved by the optical network. However, there are few papers on the factors that affect the classification of text. The text classification method used is important, but the internal factors sometimes play a great role, and even affect the success or failure of the whole text classification. In order to make up for this deficiency, this paper selects the Rocchio algorithm as the classification method, mainly from the category clustering density, class complexity, category definition, stop words and document’s length five internal factors, we tested their influences on text classification by the experiment. Experiment shows that the clustering density is higher and the complexity of the lower class, class definition is higher, the higher the accuracy of text classification, text classification effect is better, and better effect to text stop words, the length of the text does not directly affect the effect of text classification, but according to the text classification algorithm is more suitable to choose the length of the document.https://doi.org/10.1051/matecconf/201817303072 |
spellingShingle | Mingqiang Wu Chang Furong Zhang Kui Research on the internal influence factors of the text multi-classification problem MATEC Web of Conferences |
title | Research on the internal influence factors of the text multi-classification problem |
title_full | Research on the internal influence factors of the text multi-classification problem |
title_fullStr | Research on the internal influence factors of the text multi-classification problem |
title_full_unstemmed | Research on the internal influence factors of the text multi-classification problem |
title_short | Research on the internal influence factors of the text multi-classification problem |
title_sort | research on the internal influence factors of the text multi classification problem |
url | https://doi.org/10.1051/matecconf/201817303072 |
work_keys_str_mv | AT mingqiangwu researchontheinternalinfluencefactorsofthetextmulticlassificationproblem AT changfurong researchontheinternalinfluencefactorsofthetextmulticlassificationproblem AT zhangkui researchontheinternalinfluencefactorsofthetextmulticlassificationproblem |