Distributed Clustering of Text Collections
Current data processing tasks require efficient approaches capable of dealing with large databases. A promising strategy consists in distributing the data along with several computers that partially solve the undertaken problem. Finally, these partial answers are integrated to obtain a final solutio...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8882328/ |
_version_ | 1818557560971591680 |
---|---|
author | Juan Zamora Hector Allende-Cid Marcelo Mendoza |
author_facet | Juan Zamora Hector Allende-Cid Marcelo Mendoza |
author_sort | Juan Zamora |
collection | DOAJ |
description | Current data processing tasks require efficient approaches capable of dealing with large databases. A promising strategy consists in distributing the data along with several computers that partially solve the undertaken problem. Finally, these partial answers are integrated to obtain a final solution. We introduce distributed shared nearest neighbors (D-SNN), a novel clustering algorithm that work with disjoint partitions of data. Our algorithm produces a global clustering solution that achieves a competitive performance regarding centralized approaches. The algorithm works effectively with high dimensional data, being advisable for document clustering tasks. Experimental results over five data sets show that our proposal is competitive in terms of quality performance measures when compared to state of the art methods. |
first_indexed | 2024-12-14T00:01:08Z |
format | Article |
id | doaj.art-aaf0d0b3e1a1492486eb339f7905b6de |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-14T00:01:08Z |
publishDate | 2019-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-aaf0d0b3e1a1492486eb339f7905b6de2022-12-21T23:26:19ZengIEEEIEEE Access2169-35362019-01-01715567115568510.1109/ACCESS.2019.29494558882328Distributed Clustering of Text CollectionsJuan Zamora0https://orcid.org/0000-0003-0003-182XHector Allende-Cid1Marcelo Mendoza2Instituto de Estadística, Pontificia Universidad Católica de Valparaíso, Valparaíso, ChileEscuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Valparaíso, ChileCentro Científico y Tecnológico de Valparaíso, Universidad Técnica Federico Santa María, Valparaíso, ChileCurrent data processing tasks require efficient approaches capable of dealing with large databases. A promising strategy consists in distributing the data along with several computers that partially solve the undertaken problem. Finally, these partial answers are integrated to obtain a final solution. We introduce distributed shared nearest neighbors (D-SNN), a novel clustering algorithm that work with disjoint partitions of data. Our algorithm produces a global clustering solution that achieves a competitive performance regarding centralized approaches. The algorithm works effectively with high dimensional data, being advisable for document clustering tasks. Experimental results over five data sets show that our proposal is competitive in terms of quality performance measures when compared to state of the art methods.https://ieeexplore.ieee.org/document/8882328/Distributed algorithmsdistributed text clusteringhigh dimensional data |
spellingShingle | Juan Zamora Hector Allende-Cid Marcelo Mendoza Distributed Clustering of Text Collections IEEE Access Distributed algorithms distributed text clustering high dimensional data |
title | Distributed Clustering of Text Collections |
title_full | Distributed Clustering of Text Collections |
title_fullStr | Distributed Clustering of Text Collections |
title_full_unstemmed | Distributed Clustering of Text Collections |
title_short | Distributed Clustering of Text Collections |
title_sort | distributed clustering of text collections |
topic | Distributed algorithms distributed text clustering high dimensional data |
url | https://ieeexplore.ieee.org/document/8882328/ |
work_keys_str_mv | AT juanzamora distributedclusteringoftextcollections AT hectorallendecid distributedclusteringoftextcollections AT marcelomendoza distributedclusteringoftextcollections |