Performance optimization for distributed machine learning and graph processing at scale over virtualized infrastructure

Nowadays, many real-world applications can be represented as machine learning and graph processing (MLGP) problems, and require sophisticated analysis on massive datasets. Various distributed computing systems have been proposed to run MLGP applications in a cluster. These systems usually manage the...

Full description

Bibliographic Details
Main Author:	Sun, Peng
Other Authors:	Wen Yonggang
Format:	Thesis
Language:	English
Published:	2018
Subjects:	DRNTU::Engineering::Computer science and engineering::Computer systems organization
Online Access:	http://hdl.handle.net/10356/73229

_version_	1811697093979406336
author	Sun, Peng
author2	Wen Yonggang
author_facet	Wen Yonggang Sun, Peng
author_sort	Sun, Peng
collection	NTU
description	Nowadays, many real-world applications can be represented as machine learning and graph processing (MLGP) problems, and require sophisticated analysis on massive datasets. Various distributed computing systems have been proposed to run MLGP applications in a cluster. These systems usually manage the input data in a distributed file system (DFS), perform data-parallel computation on multiple machines, and exchange intermediate data via network. In this thesis, we focus on performance optimization of distributed MLGP over virtualized infrastructure. First, we focus on improving the resource utilization of a cluster shared with multiple distributed MLGP workloads. Organizations are trending to use a cluster management system (CMS) to run multiple distributed MLGP applications in a single cluster. Existing CMSs can only allocate a static partition of the cluster to each application, leading to poor cluster utilization. To address this problem, we propose a new CMS named Dorm, which leverages virtualization techniques to partition a cluster, runs one application per partition, and can dynamically resize each partition at application runtime to achieve high cluster utilization and meet other performance constraints. Extensive performance evaluations have shown that Dorm could increase the cluster utilization by a factor of up to 2.32. Second, we improve the metadata lookup performance for DFSs. Existing DFSs usually use distributed hash table (DHT) to manage their metadata servers. When performing a metadata operation, users should first use a lookup service to locate the desired metadata object. The lookup operation could lead to reduced metadata operation throughput and high latency. To address this problem, we design a new metadata lookup service called MetaFlow. MetaFlow leverages software-defined networking (SDN) techniques to transfer metadata lookup to the network layer, and generates appropriate flow tables for SDN-enabled switches by mapping the physical network topology to a logical B-tree. Extensive performance evaluations have shown that MetaFlow could increase the system throughput by a factor of up to 6.5, and reduce the system latency by a factor of up to 5 for the metadata management, compared to DHT-based approaches. Third, we reduce the communication overhead of distributed machine learning (ML) based on the Parameter Server (PS) framework. The PS framework has a group of worker nodes performing data-parallel computation, and has a group of server nodes maintaining globally shared parameters. Each worker node would continually pull parameters from server nodes and push updates to server nodes, resulting in high communication overhead. To address this problem, we design ParameterFlow, a communication layer for the PS framework with an updatecentric communication (UCC) model and a dynamic value-bounded filter (DVF). UCC introduces a broadcast/push model to exchange data between worker nodes and server nodes. DVF could directly reduce network traffic and communication time by selectively dropping updates for network transmission. Experiments have shown that that PF could speed up popular distributed ML applications by a factor of up to 4.3, compared to the conventional PS framework. Last, we enable high-performance large-scale graph processing in small clusters with limited memory. When processing big graphs, existing in-memory graph processing systems can easily exceed the cluster memory capacity. While out-ofcore approaches could handle big graphs, they have poor performance due to high disk I/O overhead. We design a new distributed graph processing system named GraphH with three techniques: a gather-apply-broadcast computation model, an edge cache system and a hybrid communication mode. Experiments have shown that GraphH outperforms existing out-of-core systems by more than 100x, when processing big graphs in small clusters with limited memory.The proposed approaches and obtained results can provide guidelines to improve large-scale distributed MLGP applications over virtualized infrastructure.
first_indexed	2024-10-01T07:49:47Z
format	Thesis
id	ntu-10356/73229
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T07:49:47Z
publishDate	2018
record_format	dspace
spelling	ntu-10356/732292021-03-20T14:04:10Z Performance optimization for distributed machine learning and graph processing at scale over virtualized infrastructure Sun, Peng Wen Yonggang Interdisciplinary Graduate School (IGS) DRNTU::Engineering::Computer science and engineering::Computer systems organization Nowadays, many real-world applications can be represented as machine learning and graph processing (MLGP) problems, and require sophisticated analysis on massive datasets. Various distributed computing systems have been proposed to run MLGP applications in a cluster. These systems usually manage the input data in a distributed file system (DFS), perform data-parallel computation on multiple machines, and exchange intermediate data via network. In this thesis, we focus on performance optimization of distributed MLGP over virtualized infrastructure. First, we focus on improving the resource utilization of a cluster shared with multiple distributed MLGP workloads. Organizations are trending to use a cluster management system (CMS) to run multiple distributed MLGP applications in a single cluster. Existing CMSs can only allocate a static partition of the cluster to each application, leading to poor cluster utilization. To address this problem, we propose a new CMS named Dorm, which leverages virtualization techniques to partition a cluster, runs one application per partition, and can dynamically resize each partition at application runtime to achieve high cluster utilization and meet other performance constraints. Extensive performance evaluations have shown that Dorm could increase the cluster utilization by a factor of up to 2.32. Second, we improve the metadata lookup performance for DFSs. Existing DFSs usually use distributed hash table (DHT) to manage their metadata servers. When performing a metadata operation, users should first use a lookup service to locate the desired metadata object. The lookup operation could lead to reduced metadata operation throughput and high latency. To address this problem, we design a new metadata lookup service called MetaFlow. MetaFlow leverages software-defined networking (SDN) techniques to transfer metadata lookup to the network layer, and generates appropriate flow tables for SDN-enabled switches by mapping the physical network topology to a logical B-tree. Extensive performance evaluations have shown that MetaFlow could increase the system throughput by a factor of up to 6.5, and reduce the system latency by a factor of up to 5 for the metadata management, compared to DHT-based approaches. Third, we reduce the communication overhead of distributed machine learning (ML) based on the Parameter Server (PS) framework. The PS framework has a group of worker nodes performing data-parallel computation, and has a group of server nodes maintaining globally shared parameters. Each worker node would continually pull parameters from server nodes and push updates to server nodes, resulting in high communication overhead. To address this problem, we design ParameterFlow, a communication layer for the PS framework with an updatecentric communication (UCC) model and a dynamic value-bounded filter (DVF). UCC introduces a broadcast/push model to exchange data between worker nodes and server nodes. DVF could directly reduce network traffic and communication time by selectively dropping updates for network transmission. Experiments have shown that that PF could speed up popular distributed ML applications by a factor of up to 4.3, compared to the conventional PS framework. Last, we enable high-performance large-scale graph processing in small clusters with limited memory. When processing big graphs, existing in-memory graph processing systems can easily exceed the cluster memory capacity. While out-ofcore approaches could handle big graphs, they have poor performance due to high disk I/O overhead. We design a new distributed graph processing system named GraphH with three techniques: a gather-apply-broadcast computation model, an edge cache system and a hybrid communication mode. Experiments have shown that GraphH outperforms existing out-of-core systems by more than 100x, when processing big graphs in small clusters with limited memory.The proposed approaches and obtained results can provide guidelines to improve large-scale distributed MLGP applications over virtualized infrastructure. Doctor of Philosophy (IGS) 2018-01-26T08:19:25Z 2018-01-26T08:19:25Z 2018 Thesis Sun, P. (2018). Performance optimization for distributed machine learning and graph processing at scale over virtualized infrastructure. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/73229 10.32657/10356/73229 en 199 p. application/pdf
spellingShingle	DRNTU::Engineering::Computer science and engineering::Computer systems organization Sun, Peng Performance optimization for distributed machine learning and graph processing at scale over virtualized infrastructure
title	Performance optimization for distributed machine learning and graph processing at scale over virtualized infrastructure
title_full	Performance optimization for distributed machine learning and graph processing at scale over virtualized infrastructure
title_fullStr	Performance optimization for distributed machine learning and graph processing at scale over virtualized infrastructure
title_full_unstemmed	Performance optimization for distributed machine learning and graph processing at scale over virtualized infrastructure
title_short	Performance optimization for distributed machine learning and graph processing at scale over virtualized infrastructure
title_sort	performance optimization for distributed machine learning and graph processing at scale over virtualized infrastructure
topic	DRNTU::Engineering::Computer science and engineering::Computer systems organization
url	http://hdl.handle.net/10356/73229
work_keys_str_mv	AT sunpeng performanceoptimizationfordistributedmachinelearningandgraphprocessingatscaleovervirtualizedinfrastructure

Performance optimization for distributed machine learning and graph processing at scale over virtualized infrastructure

Similar Items