Semantic Based Cluster Content Discovery in Description First Clustering Algorithm
In the field of data analytics grouping of like documents in textual data is a serious problem. A lot of work has been done in this field and many algorithms have purposed. One of them is a category of algorithms which firstly group the documents on the basis of similarity and then assign the meani...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Mehran University of Engineering and Technology
2017-01-01
|
Series: | Mehran University Research Journal of Engineering and Technology |
Subjects: | |
Online Access: | http://publications.muet.edu.pk/research_papers/pdf/pdf1428.pdf |
_version_ | 1818945258554130432 |
---|---|
author | MUHAMMAD WASEEM KHAN HAFIZ MUHAMMAD SHAHZAD ASIF YASIR SALEEM |
author_facet | MUHAMMAD WASEEM KHAN HAFIZ MUHAMMAD SHAHZAD ASIF YASIR SALEEM |
author_sort | MUHAMMAD WASEEM KHAN |
collection | DOAJ |
description | In the field of data analytics grouping of like documents in textual data is a serious problem. A lot of work has been done in this field and many algorithms have purposed. One of them is a category of algorithms
which firstly group the documents on the basis of similarity and then assign the meaningful labels to those groups. Description first clustering algorithm belong to the category in which the meaningful
description is deduced first and then relevant documents are assigned to that description. LINGO (Label Induction Grouping Algorithm) is the algorithm of description first clustering category which is used for the automatic grouping of documents obtained from search results. It uses LSI (Latent Semantic
Indexing); an IR (Information Retrieval) technique for induction of meaningful labels for clusters and
VSM (Vector Space Model) for cluster content discovery. In this paper we present the LINGO while it is using LSI during cluster label induction and cluster content discovery phase. Finally, we compare results obtained from the said algorithm while it uses VSM and Latent semantic analysis during cluster
content discovery phase. |
first_indexed | 2024-12-20T07:56:16Z |
format | Article |
id | doaj.art-5ec1e95358604cbfa2f20f252a3039f7 |
institution | Directory Open Access Journal |
issn | 0254-7821 2413-7219 |
language | English |
last_indexed | 2024-12-20T07:56:16Z |
publishDate | 2017-01-01 |
publisher | Mehran University of Engineering and Technology |
record_format | Article |
series | Mehran University Research Journal of Engineering and Technology |
spelling | doaj.art-5ec1e95358604cbfa2f20f252a3039f72022-12-21T19:47:41ZengMehran University of Engineering and TechnologyMehran University Research Journal of Engineering and Technology0254-78212413-72192017-01-01361161437Semantic Based Cluster Content Discovery in Description First Clustering AlgorithmMUHAMMAD WASEEM KHANHAFIZ MUHAMMAD SHAHZAD ASIFYASIR SALEEMIn the field of data analytics grouping of like documents in textual data is a serious problem. A lot of work has been done in this field and many algorithms have purposed. One of them is a category of algorithms which firstly group the documents on the basis of similarity and then assign the meaningful labels to those groups. Description first clustering algorithm belong to the category in which the meaningful description is deduced first and then relevant documents are assigned to that description. LINGO (Label Induction Grouping Algorithm) is the algorithm of description first clustering category which is used for the automatic grouping of documents obtained from search results. It uses LSI (Latent Semantic Indexing); an IR (Information Retrieval) technique for induction of meaningful labels for clusters and VSM (Vector Space Model) for cluster content discovery. In this paper we present the LINGO while it is using LSI during cluster label induction and cluster content discovery phase. Finally, we compare results obtained from the said algorithm while it uses VSM and Latent semantic analysis during cluster content discovery phase.http://publications.muet.edu.pk/research_papers/pdf/pdf1428.pdfInformation RetrievalSingular Value DecompositionVector Space ModelLabel Induction Grouping AlgorithmTerm FrequencyInverse Document Frequency |
spellingShingle | MUHAMMAD WASEEM KHAN HAFIZ MUHAMMAD SHAHZAD ASIF YASIR SALEEM Semantic Based Cluster Content Discovery in Description First Clustering Algorithm Mehran University Research Journal of Engineering and Technology Information Retrieval Singular Value Decomposition Vector Space Model Label Induction Grouping Algorithm Term Frequency Inverse Document Frequency |
title | Semantic Based Cluster Content Discovery in Description First Clustering Algorithm |
title_full | Semantic Based Cluster Content Discovery in Description First Clustering Algorithm |
title_fullStr | Semantic Based Cluster Content Discovery in Description First Clustering Algorithm |
title_full_unstemmed | Semantic Based Cluster Content Discovery in Description First Clustering Algorithm |
title_short | Semantic Based Cluster Content Discovery in Description First Clustering Algorithm |
title_sort | semantic based cluster content discovery in description first clustering algorithm |
topic | Information Retrieval Singular Value Decomposition Vector Space Model Label Induction Grouping Algorithm Term Frequency Inverse Document Frequency |
url | http://publications.muet.edu.pk/research_papers/pdf/pdf1428.pdf |
work_keys_str_mv | AT muhammadwaseemkhan semanticbasedclustercontentdiscoveryindescriptionfirstclusteringalgorithm AT hafizmuhammadshahzadasif semanticbasedclustercontentdiscoveryindescriptionfirstclusteringalgorithm AT yasirsaleem semanticbasedclustercontentdiscoveryindescriptionfirstclusteringalgorithm |