Persian Text Classification Enhancement by Latent Semantic Space

Heterogeneous data in all groups are growing on the web nowadays. Because of the variety of data types in the web search results, it is common to classify the results in order to find the preferred data. Many machine learning methods are used to classify textual data. The main challenges in data cla...

Full description

Bibliographic Details
Main Authors: Mohammad Bagher Dastgheib, Sara Koleini
Format: Article
Language:English
Published: Regional Information Center for Science and Technology (RICeST) 2019-01-01
Series:International Journal of Information Science and Management
Subjects:
Online Access:https://ijism.ricest.ac.ir/index.php/ijism/article/view/1382
_version_ 1818570152324628480
author Mohammad Bagher Dastgheib
Sara Koleini
author_facet Mohammad Bagher Dastgheib
Sara Koleini
author_sort Mohammad Bagher Dastgheib
collection DOAJ
description Heterogeneous data in all groups are growing on the web nowadays. Because of the variety of data types in the web search results, it is common to classify the results in order to find the preferred data. Many machine learning methods are used to classify textual data. The main challenges in data classification are the cost of classifier and performance of classification. A traditional model in IR and text data representation is the vector space model. In this representation cost of computations are dependent upon the dimension of the vector. Another problem is to select effective features and prune unwanted terms. Latent semantic indexing is used to transform VSM to orthogonal semantic space with term relation consideration. Experimental results showed that LSI semantic space can achieve better performance in computation time and classification accuracy. This result showed that semantic topic space has less noise so the accuracy will increase. Less vector dimension also reduces the computational complexity.
first_indexed 2024-12-14T06:56:55Z
format Article
id doaj.art-89c8ded7bf0f4550a805fd711525e656
institution Directory Open Access Journal
issn 2008-8302
2008-8310
language English
last_indexed 2024-12-14T06:56:55Z
publishDate 2019-01-01
publisher Regional Information Center for Science and Technology (RICeST)
record_format Article
series International Journal of Information Science and Management
spelling doaj.art-89c8ded7bf0f4550a805fd711525e6562022-12-21T23:12:37ZengRegional Information Center for Science and Technology (RICeST)International Journal of Information Science and Management2008-83022008-83102019-01-01171315Persian Text Classification Enhancement by Latent Semantic SpaceMohammad Bagher Dastgheib0Sara Koleini1Assistant Prof. in Computer Engineering, Research Department of Design and System Operations, Regional Information Center for Science and TechnologyM.S. in Computer Engineering, Senior expert staff of network engineer, Department of Information and Communication Technology Management, Regional Information Center for Science and TechnologyHeterogeneous data in all groups are growing on the web nowadays. Because of the variety of data types in the web search results, it is common to classify the results in order to find the preferred data. Many machine learning methods are used to classify textual data. The main challenges in data classification are the cost of classifier and performance of classification. A traditional model in IR and text data representation is the vector space model. In this representation cost of computations are dependent upon the dimension of the vector. Another problem is to select effective features and prune unwanted terms. Latent semantic indexing is used to transform VSM to orthogonal semantic space with term relation consideration. Experimental results showed that LSI semantic space can achieve better performance in computation time and classification accuracy. This result showed that semantic topic space has less noise so the accuracy will increase. Less vector dimension also reduces the computational complexity.https://ijism.ricest.ac.ir/index.php/ijism/article/view/1382persian text classification, vector space model, latent semantic indexing (lsi).
spellingShingle Mohammad Bagher Dastgheib
Sara Koleini
Persian Text Classification Enhancement by Latent Semantic Space
International Journal of Information Science and Management
persian text classification, vector space model, latent semantic indexing (lsi).
title Persian Text Classification Enhancement by Latent Semantic Space
title_full Persian Text Classification Enhancement by Latent Semantic Space
title_fullStr Persian Text Classification Enhancement by Latent Semantic Space
title_full_unstemmed Persian Text Classification Enhancement by Latent Semantic Space
title_short Persian Text Classification Enhancement by Latent Semantic Space
title_sort persian text classification enhancement by latent semantic space
topic persian text classification, vector space model, latent semantic indexing (lsi).
url https://ijism.ricest.ac.ir/index.php/ijism/article/view/1382
work_keys_str_mv AT mohammadbagherdastgheib persiantextclassificationenhancementbylatentsemanticspace
AT sarakoleini persiantextclassificationenhancementbylatentsemanticspace