UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware.

With a cluster of commodity hardware, how can we efficiently find all connected components of an enormous graph containing hundreds of billions of nodes and edges? The problem of finding connected components has been used in various applications such as pattern recognition, reachability indexing, gr...

Full description

Bibliographic Details
Main Authors: Chaeeun Kim, Changhun Han, Ha-Myung Park
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2022-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0277527
_version_ 1797959548557852672
author Chaeeun Kim
Changhun Han
Ha-Myung Park
author_facet Chaeeun Kim
Changhun Han
Ha-Myung Park
author_sort Chaeeun Kim
collection DOAJ
description With a cluster of commodity hardware, how can we efficiently find all connected components of an enormous graph containing hundreds of billions of nodes and edges? The problem of finding connected components has been used in various applications such as pattern recognition, reachability indexing, graph compression, graph partitioning, and random walk. Several studies have been proposed to efficiently find connected components in various environments. Most existing single-machine and distributed-memory algorithms are limited in scalability as they have to load all data generated during the process into the main memory; they require expensive machines with vast memory capacities to handle large graphs. Several MapReduce algorithms try to handle large graphs by exploiting distributed storage but fail due to data explosion problems, which is a phenomenon that significantly increases the size of data as the computation proceeds. The latest MapReduce algorithms resolve the problem by proposing two distinguishing star-operations and executing them alternately, while the star-operations still cause massive network traffic as a star-operation is a distributed operation that connects each node to its smallest neighbor. In this paper, we unite the two star-operations into a single operation, namely UniStar, and propose UniCon, a new distributed algorithm for finding connected components in enormous graphs using UniStar. The partition-aware processing of UniStar effectively resolves the data explosion problems. We further optimize UniStar by filtering dispensable edges and exploiting a hybrid data structure. Experimental results with a cluster of 10 cheap machines each of which is equipped with Intel Xeon E3-1220 CPU (4-cores at 3.10GHz), 16GB RAM, and 2 SSDs of 1TB show that UniCon is up to 13 times faster than competitors on real-world graphs. UniCon succeeds in processing a tremendous graph with 129 billion edges, which is up to 4096 times larger than graphs competitors can process.
first_indexed 2024-04-11T00:34:17Z
format Article
id doaj.art-e320b6aeab404f2aa25f8359e8c92a27
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-04-11T00:34:17Z
publishDate 2022-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-e320b6aeab404f2aa25f8359e8c92a272023-01-07T05:30:58ZengPublic Library of Science (PLoS)PLoS ONE1932-62032022-01-011711e027752710.1371/journal.pone.0277527UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware.Chaeeun KimChanghun HanHa-Myung ParkWith a cluster of commodity hardware, how can we efficiently find all connected components of an enormous graph containing hundreds of billions of nodes and edges? The problem of finding connected components has been used in various applications such as pattern recognition, reachability indexing, graph compression, graph partitioning, and random walk. Several studies have been proposed to efficiently find connected components in various environments. Most existing single-machine and distributed-memory algorithms are limited in scalability as they have to load all data generated during the process into the main memory; they require expensive machines with vast memory capacities to handle large graphs. Several MapReduce algorithms try to handle large graphs by exploiting distributed storage but fail due to data explosion problems, which is a phenomenon that significantly increases the size of data as the computation proceeds. The latest MapReduce algorithms resolve the problem by proposing two distinguishing star-operations and executing them alternately, while the star-operations still cause massive network traffic as a star-operation is a distributed operation that connects each node to its smallest neighbor. In this paper, we unite the two star-operations into a single operation, namely UniStar, and propose UniCon, a new distributed algorithm for finding connected components in enormous graphs using UniStar. The partition-aware processing of UniStar effectively resolves the data explosion problems. We further optimize UniStar by filtering dispensable edges and exploiting a hybrid data structure. Experimental results with a cluster of 10 cheap machines each of which is equipped with Intel Xeon E3-1220 CPU (4-cores at 3.10GHz), 16GB RAM, and 2 SSDs of 1TB show that UniCon is up to 13 times faster than competitors on real-world graphs. UniCon succeeds in processing a tremendous graph with 129 billion edges, which is up to 4096 times larger than graphs competitors can process.https://doi.org/10.1371/journal.pone.0277527
spellingShingle Chaeeun Kim
Changhun Han
Ha-Myung Park
UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware.
PLoS ONE
title UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware.
title_full UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware.
title_fullStr UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware.
title_full_unstemmed UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware.
title_short UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware.
title_sort unicon a unified star operation to efficiently find connected components on a cluster of commodity hardware
url https://doi.org/10.1371/journal.pone.0277527
work_keys_str_mv AT chaeeunkim uniconaunifiedstaroperationtoefficientlyfindconnectedcomponentsonaclusterofcommodityhardware
AT changhunhan uniconaunifiedstaroperationtoefficientlyfindconnectedcomponentsonaclusterofcommodityhardware
AT hamyungpark uniconaunifiedstaroperationtoefficientlyfindconnectedcomponentsonaclusterofcommodityhardware