SparseHC : a memory-efficient online hierarchical clustering algorithm

Computing a hierarchical clustering of objects from a pairwise distance matrix is an important algorithmic kernel in computational science. Since the storage of this matrix requires quadratic space with respect to the number of objects, the design of memory-efficient approaches is of high importance...

Full description

Bibliographic Details
Main Authors: Nguyen, Thuy-Diem, Schmidt, Bertil, Kwoh, Chee-Keong
Other Authors: School of Computer Engineering
Format: Journal Article
Language:English
Published: 2014
Subjects:
Online Access:https://hdl.handle.net/10356/104862
http://hdl.handle.net/10220/20325
_version_ 1811677029900222464
author Nguyen, Thuy-Diem
Schmidt, Bertil
Kwoh, Chee-Keong
author2 School of Computer Engineering
author_facet School of Computer Engineering
Nguyen, Thuy-Diem
Schmidt, Bertil
Kwoh, Chee-Keong
author_sort Nguyen, Thuy-Diem
collection NTU
description Computing a hierarchical clustering of objects from a pairwise distance matrix is an important algorithmic kernel in computational science. Since the storage of this matrix requires quadratic space with respect to the number of objects, the design of memory-efficient approaches is of high importance to this research area. In this paper, we address this problem by presenting a memory-efficient online hierarchical clustering algorithm called SparseHC. SparseHC scans a sorted and possibly sparse distance matrix chunk-by-chunk. Meanwhile, a dendrogram is built by merging cluster pairs as and when the distance between them is determined to be the smallest among all remaining cluster pairs. The key insight used is that for finding the cluster pair with the smallest distance, it is unnecessary to complete the computation of all cluster pairwise distances. Partial information can be utilized to calculate a lower bound on cluster pairwise distances that are subsequently used for cluster distance comparison. Our experimental results show that SparseHC achieves a linear empirical memory complexity, which is a significant improvement compared to existing algorithms.
first_indexed 2024-10-01T02:30:53Z
format Journal Article
id ntu-10356/104862
institution Nanyang Technological University
language English
last_indexed 2024-10-01T02:30:53Z
publishDate 2014
record_format dspace
spelling ntu-10356/1048622020-05-28T07:19:13Z SparseHC : a memory-efficient online hierarchical clustering algorithm Nguyen, Thuy-Diem Schmidt, Bertil Kwoh, Chee-Keong School of Computer Engineering DRNTU::Engineering::Computer science and engineering Computing a hierarchical clustering of objects from a pairwise distance matrix is an important algorithmic kernel in computational science. Since the storage of this matrix requires quadratic space with respect to the number of objects, the design of memory-efficient approaches is of high importance to this research area. In this paper, we address this problem by presenting a memory-efficient online hierarchical clustering algorithm called SparseHC. SparseHC scans a sorted and possibly sparse distance matrix chunk-by-chunk. Meanwhile, a dendrogram is built by merging cluster pairs as and when the distance between them is determined to be the smallest among all remaining cluster pairs. The key insight used is that for finding the cluster pair with the smallest distance, it is unnecessary to complete the computation of all cluster pairwise distances. Partial information can be utilized to calculate a lower bound on cluster pairwise distances that are subsequently used for cluster distance comparison. Our experimental results show that SparseHC achieves a linear empirical memory complexity, which is a significant improvement compared to existing algorithms. Published version 2014-08-18T04:44:59Z 2019-12-06T21:41:27Z 2014-08-18T04:44:59Z 2019-12-06T21:41:27Z 2014 2014 Journal Article Nguyen, T.- D., Schmidt, B., & Kwoh, C.- K. (2014). SparseHC: A Memory-efficient Online Hierarchical Clustering Algorithm. Procedia Computer Science, 29, 8-19. 1877-0509 https://hdl.handle.net/10356/104862 http://hdl.handle.net/10220/20325 10.1016/j.procs.2014.05.001 en Procedia computer science © 2014 The Author(s). This paper was published in Procedia Computer Science and is made available as an electronic reprint (preprint) with permission of the Author(s). The paper can be found at the following official DOI: http://dx.doi.org/10.1016/j.procs.2014.05.001.  One print or electronic copy may be made for personal use only. Systematic or multiple reproduction, distribution to multiple locations via electronic or other means, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper is prohibited and is subject to penalties under law. application/pdf
spellingShingle DRNTU::Engineering::Computer science and engineering
Nguyen, Thuy-Diem
Schmidt, Bertil
Kwoh, Chee-Keong
SparseHC : a memory-efficient online hierarchical clustering algorithm
title SparseHC : a memory-efficient online hierarchical clustering algorithm
title_full SparseHC : a memory-efficient online hierarchical clustering algorithm
title_fullStr SparseHC : a memory-efficient online hierarchical clustering algorithm
title_full_unstemmed SparseHC : a memory-efficient online hierarchical clustering algorithm
title_short SparseHC : a memory-efficient online hierarchical clustering algorithm
title_sort sparsehc a memory efficient online hierarchical clustering algorithm
topic DRNTU::Engineering::Computer science and engineering
url https://hdl.handle.net/10356/104862
http://hdl.handle.net/10220/20325
work_keys_str_mv AT nguyenthuydiem sparsehcamemoryefficientonlinehierarchicalclusteringalgorithm
AT schmidtbertil sparsehcamemoryefficientonlinehierarchicalclusteringalgorithm
AT kwohcheekeong sparsehcamemoryefficientonlinehierarchicalclusteringalgorithm