Semantic Based Cluster Content Discovery in Description First Clustering Algorithm

In the field of data analytics grouping of like documents in textual data is a serious problem. A lot of work has been done in this field and many algorithms have purposed. One of them is a category of algorithms which firstly group the documents on the basis of similarity and then assign the meani...

Full description

Bibliographic Details
Main Authors: MUHAMMAD WASEEM KHAN, HAFIZ MUHAMMAD SHAHZAD ASIF, YASIR SALEEM
Format: Article
Language:English
Published: Mehran University of Engineering and Technology 2017-01-01
Series:Mehran University Research Journal of Engineering and Technology
Subjects:
Online Access:http://publications.muet.edu.pk/research_papers/pdf/pdf1428.pdf
_version_ 1818945258554130432
author MUHAMMAD WASEEM KHAN
HAFIZ MUHAMMAD SHAHZAD ASIF
YASIR SALEEM
author_facet MUHAMMAD WASEEM KHAN
HAFIZ MUHAMMAD SHAHZAD ASIF
YASIR SALEEM
author_sort MUHAMMAD WASEEM KHAN
collection DOAJ
description In the field of data analytics grouping of like documents in textual data is a serious problem. A lot of work has been done in this field and many algorithms have purposed. One of them is a category of algorithms which firstly group the documents on the basis of similarity and then assign the meaningful labels to those groups. Description first clustering algorithm belong to the category in which the meaningful description is deduced first and then relevant documents are assigned to that description. LINGO (Label Induction Grouping Algorithm) is the algorithm of description first clustering category which is used for the automatic grouping of documents obtained from search results. It uses LSI (Latent Semantic Indexing); an IR (Information Retrieval) technique for induction of meaningful labels for clusters and VSM (Vector Space Model) for cluster content discovery. In this paper we present the LINGO while it is using LSI during cluster label induction and cluster content discovery phase. Finally, we compare results obtained from the said algorithm while it uses VSM and Latent semantic analysis during cluster content discovery phase.
first_indexed 2024-12-20T07:56:16Z
format Article
id doaj.art-5ec1e95358604cbfa2f20f252a3039f7
institution Directory Open Access Journal
issn 0254-7821
2413-7219
language English
last_indexed 2024-12-20T07:56:16Z
publishDate 2017-01-01
publisher Mehran University of Engineering and Technology
record_format Article
series Mehran University Research Journal of Engineering and Technology
spelling doaj.art-5ec1e95358604cbfa2f20f252a3039f72022-12-21T19:47:41ZengMehran University of Engineering and TechnologyMehran University Research Journal of Engineering and Technology0254-78212413-72192017-01-01361161437Semantic Based Cluster Content Discovery in Description First Clustering AlgorithmMUHAMMAD WASEEM KHANHAFIZ MUHAMMAD SHAHZAD ASIFYASIR SALEEMIn the field of data analytics grouping of like documents in textual data is a serious problem. A lot of work has been done in this field and many algorithms have purposed. One of them is a category of algorithms which firstly group the documents on the basis of similarity and then assign the meaningful labels to those groups. Description first clustering algorithm belong to the category in which the meaningful description is deduced first and then relevant documents are assigned to that description. LINGO (Label Induction Grouping Algorithm) is the algorithm of description first clustering category which is used for the automatic grouping of documents obtained from search results. It uses LSI (Latent Semantic Indexing); an IR (Information Retrieval) technique for induction of meaningful labels for clusters and VSM (Vector Space Model) for cluster content discovery. In this paper we present the LINGO while it is using LSI during cluster label induction and cluster content discovery phase. Finally, we compare results obtained from the said algorithm while it uses VSM and Latent semantic analysis during cluster content discovery phase.http://publications.muet.edu.pk/research_papers/pdf/pdf1428.pdfInformation RetrievalSingular Value DecompositionVector Space ModelLabel Induction Grouping AlgorithmTerm FrequencyInverse Document Frequency
spellingShingle MUHAMMAD WASEEM KHAN
HAFIZ MUHAMMAD SHAHZAD ASIF
YASIR SALEEM
Semantic Based Cluster Content Discovery in Description First Clustering Algorithm
Mehran University Research Journal of Engineering and Technology
Information Retrieval
Singular Value Decomposition
Vector Space Model
Label Induction Grouping Algorithm
Term Frequency
Inverse Document Frequency
title Semantic Based Cluster Content Discovery in Description First Clustering Algorithm
title_full Semantic Based Cluster Content Discovery in Description First Clustering Algorithm
title_fullStr Semantic Based Cluster Content Discovery in Description First Clustering Algorithm
title_full_unstemmed Semantic Based Cluster Content Discovery in Description First Clustering Algorithm
title_short Semantic Based Cluster Content Discovery in Description First Clustering Algorithm
title_sort semantic based cluster content discovery in description first clustering algorithm
topic Information Retrieval
Singular Value Decomposition
Vector Space Model
Label Induction Grouping Algorithm
Term Frequency
Inverse Document Frequency
url http://publications.muet.edu.pk/research_papers/pdf/pdf1428.pdf
work_keys_str_mv AT muhammadwaseemkhan semanticbasedclustercontentdiscoveryindescriptionfirstclusteringalgorithm
AT hafizmuhammadshahzadasif semanticbasedclustercontentdiscoveryindescriptionfirstclusteringalgorithm
AT yasirsaleem semanticbasedclustercontentdiscoveryindescriptionfirstclusteringalgorithm