Research on the internal influence factors of the text multi-classification problem

This paper mainly deals with the classification of text type data. The statistics show that more than 8000 articles have been reached in all kinds of documents retrieved by the optical network. However, there are few papers on the factors that affect the classification of text. The text classificati...

Full description

Bibliographic Details
Main Authors: Mingqiang Wu, Chang Furong, Zhang Kui
Format: Article
Language:English
Published: EDP Sciences 2018-01-01
Series:MATEC Web of Conferences
Online Access:https://doi.org/10.1051/matecconf/201817303072
_version_ 1819180010021322752
author Mingqiang Wu
Chang Furong
Zhang Kui
author_facet Mingqiang Wu
Chang Furong
Zhang Kui
author_sort Mingqiang Wu
collection DOAJ
description This paper mainly deals with the classification of text type data. The statistics show that more than 8000 articles have been reached in all kinds of documents retrieved by the optical network. However, there are few papers on the factors that affect the classification of text. The text classification method used is important, but the internal factors sometimes play a great role, and even affect the success or failure of the whole text classification. In order to make up for this deficiency, this paper selects the Rocchio algorithm as the classification method, mainly from the category clustering density, class complexity, category definition, stop words and document’s length five internal factors, we tested their influences on text classification by the experiment. Experiment shows that the clustering density is higher and the complexity of the lower class, class definition is higher, the higher the accuracy of text classification, text classification effect is better, and better effect to text stop words, the length of the text does not directly affect the effect of text classification, but according to the text classification algorithm is more suitable to choose the length of the document.
first_indexed 2024-12-22T22:07:32Z
format Article
id doaj.art-164463f2cb6440fe86562af8572308cc
institution Directory Open Access Journal
issn 2261-236X
language English
last_indexed 2024-12-22T22:07:32Z
publishDate 2018-01-01
publisher EDP Sciences
record_format Article
series MATEC Web of Conferences
spelling doaj.art-164463f2cb6440fe86562af8572308cc2022-12-21T18:10:57ZengEDP SciencesMATEC Web of Conferences2261-236X2018-01-011730307210.1051/matecconf/201817303072matecconf_smima2018_03072Research on the internal influence factors of the text multi-classification problemMingqiang WuChang FurongZhang KuiThis paper mainly deals with the classification of text type data. The statistics show that more than 8000 articles have been reached in all kinds of documents retrieved by the optical network. However, there are few papers on the factors that affect the classification of text. The text classification method used is important, but the internal factors sometimes play a great role, and even affect the success or failure of the whole text classification. In order to make up for this deficiency, this paper selects the Rocchio algorithm as the classification method, mainly from the category clustering density, class complexity, category definition, stop words and document’s length five internal factors, we tested their influences on text classification by the experiment. Experiment shows that the clustering density is higher and the complexity of the lower class, class definition is higher, the higher the accuracy of text classification, text classification effect is better, and better effect to text stop words, the length of the text does not directly affect the effect of text classification, but according to the text classification algorithm is more suitable to choose the length of the document.https://doi.org/10.1051/matecconf/201817303072
spellingShingle Mingqiang Wu
Chang Furong
Zhang Kui
Research on the internal influence factors of the text multi-classification problem
MATEC Web of Conferences
title Research on the internal influence factors of the text multi-classification problem
title_full Research on the internal influence factors of the text multi-classification problem
title_fullStr Research on the internal influence factors of the text multi-classification problem
title_full_unstemmed Research on the internal influence factors of the text multi-classification problem
title_short Research on the internal influence factors of the text multi-classification problem
title_sort research on the internal influence factors of the text multi classification problem
url https://doi.org/10.1051/matecconf/201817303072
work_keys_str_mv AT mingqiangwu researchontheinternalinfluencefactorsofthetextmulticlassificationproblem
AT changfurong researchontheinternalinfluencefactorsofthetextmulticlassificationproblem
AT zhangkui researchontheinternalinfluencefactorsofthetextmulticlassificationproblem