Persian Text Classification Enhancement by Latent Semantic Space
Heterogeneous data in all groups are growing on the web nowadays. Because of the variety of data types in the web search results, it is common to classify the results in order to find the preferred data. Many machine learning methods are used to classify textual data. The main challenges in data cla...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Regional Information Center for Science and Technology (RICeST)
2019-01-01
|
Series: | International Journal of Information Science and Management |
Subjects: | |
Online Access: | https://ijism.ricest.ac.ir/index.php/ijism/article/view/1382 |
_version_ | 1818570152324628480 |
---|---|
author | Mohammad Bagher Dastgheib Sara Koleini |
author_facet | Mohammad Bagher Dastgheib Sara Koleini |
author_sort | Mohammad Bagher Dastgheib |
collection | DOAJ |
description | Heterogeneous data in all groups are growing on the web nowadays. Because of the variety of data types in the web search results, it is common to classify the results in order to find the preferred data. Many machine learning methods are used to classify textual data. The main challenges in data classification are the cost of classifier and performance of classification. A traditional model in IR and text data representation is the vector space model. In this representation cost of computations are dependent upon the dimension of the vector. Another problem is to select effective features and prune unwanted terms. Latent semantic indexing is used to transform VSM to orthogonal semantic space with term relation consideration. Experimental results showed that LSI semantic space can achieve better performance in computation time and classification accuracy. This result showed that semantic topic space has less noise so the accuracy will increase. Less vector dimension also reduces the computational complexity. |
first_indexed | 2024-12-14T06:56:55Z |
format | Article |
id | doaj.art-89c8ded7bf0f4550a805fd711525e656 |
institution | Directory Open Access Journal |
issn | 2008-8302 2008-8310 |
language | English |
last_indexed | 2024-12-14T06:56:55Z |
publishDate | 2019-01-01 |
publisher | Regional Information Center for Science and Technology (RICeST) |
record_format | Article |
series | International Journal of Information Science and Management |
spelling | doaj.art-89c8ded7bf0f4550a805fd711525e6562022-12-21T23:12:37ZengRegional Information Center for Science and Technology (RICeST)International Journal of Information Science and Management2008-83022008-83102019-01-01171315Persian Text Classification Enhancement by Latent Semantic SpaceMohammad Bagher Dastgheib0Sara Koleini1Assistant Prof. in Computer Engineering, Research Department of Design and System Operations, Regional Information Center for Science and TechnologyM.S. in Computer Engineering, Senior expert staff of network engineer, Department of Information and Communication Technology Management, Regional Information Center for Science and TechnologyHeterogeneous data in all groups are growing on the web nowadays. Because of the variety of data types in the web search results, it is common to classify the results in order to find the preferred data. Many machine learning methods are used to classify textual data. The main challenges in data classification are the cost of classifier and performance of classification. A traditional model in IR and text data representation is the vector space model. In this representation cost of computations are dependent upon the dimension of the vector. Another problem is to select effective features and prune unwanted terms. Latent semantic indexing is used to transform VSM to orthogonal semantic space with term relation consideration. Experimental results showed that LSI semantic space can achieve better performance in computation time and classification accuracy. This result showed that semantic topic space has less noise so the accuracy will increase. Less vector dimension also reduces the computational complexity.https://ijism.ricest.ac.ir/index.php/ijism/article/view/1382persian text classification, vector space model, latent semantic indexing (lsi). |
spellingShingle | Mohammad Bagher Dastgheib Sara Koleini Persian Text Classification Enhancement by Latent Semantic Space International Journal of Information Science and Management persian text classification, vector space model, latent semantic indexing (lsi). |
title | Persian Text Classification Enhancement by Latent Semantic Space |
title_full | Persian Text Classification Enhancement by Latent Semantic Space |
title_fullStr | Persian Text Classification Enhancement by Latent Semantic Space |
title_full_unstemmed | Persian Text Classification Enhancement by Latent Semantic Space |
title_short | Persian Text Classification Enhancement by Latent Semantic Space |
title_sort | persian text classification enhancement by latent semantic space |
topic | persian text classification, vector space model, latent semantic indexing (lsi). |
url | https://ijism.ricest.ac.ir/index.php/ijism/article/view/1382 |
work_keys_str_mv | AT mohammadbagherdastgheib persiantextclassificationenhancementbylatentsemanticspace AT sarakoleini persiantextclassificationenhancementbylatentsemanticspace |