Zedwulf: Power-Performance Tradeoffs of a 32-Node Zynq SoC Cluster

Commodity SoCs with hybrid architectures that combine CPUs with programmable FPGA fabric such as the Xilinx Zynq SoC have become a competitive energy-efficient platform for addressing irregular parallelism in graph problems. In this paper, we prototype a 32-node cluster composed from these Zynq SoC...

Full description

Bibliographic Details
Main Authors:	Moorthy, Pradeep, Kapre, Nachiket
Other Authors:	School of Computer Engineering
Format:	Conference Paper
Language:	English
Published:	2015
Subjects:	Computer Science and Engineering
Online Access:	https://hdl.handle.net/10356/83649 http://hdl.handle.net/10220/39205

_version_	1826127394875899904
author	Moorthy, Pradeep Kapre, Nachiket
author2	School of Computer Engineering
author_facet	School of Computer Engineering Moorthy, Pradeep Kapre, Nachiket
author_sort	Moorthy, Pradeep
collection	NTU
description	Commodity SoCs with hybrid architectures that combine CPUs with programmable FPGA fabric such as the Xilinx Zynq SoC have become a competitive energy-efficient platform for addressing irregular parallelism in graph problems. In this paper, we prototype a 32-node cluster composed from these Zynq SoC chips to accelerate communication-bound sparse graph-oriented applications such as neural network simulations. We develop specialized MPI routines specifically developed for irregular accelerator-to-accelerator communication of small message traffic. We use the ARM processor for handling the MPI stack while offloading compute-intensive calculations to the FPGA. For graphs with 32M nodes and 32M edges, Zedwulf delivers the highest 94 MTEPS (Million Traversed Edges Per Second)throughput over other x86 multi-threaded platforms in our study by 1.2 -- 1.4×. For this experiment, Zedwulf operates at an efficiency of 0.49 MTEPS/W when using ARM+FPGA which is1.2× better than using ARMv7 CPUs alone, and within 8% of the Intel Core i7-4770k platform.
first_indexed	2024-10-01T07:08:10Z
format	Conference Paper
id	ntu-10356/83649
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T07:08:10Z
publishDate	2015
record_format	dspace
spelling	ntu-10356/836492020-05-28T07:41:43Z Zedwulf: Power-Performance Tradeoffs of a 32-Node Zynq SoC Cluster Moorthy, Pradeep Kapre, Nachiket School of Computer Engineering 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Computer Science and Engineering Commodity SoCs with hybrid architectures that combine CPUs with programmable FPGA fabric such as the Xilinx Zynq SoC have become a competitive energy-efficient platform for addressing irregular parallelism in graph problems. In this paper, we prototype a 32-node cluster composed from these Zynq SoC chips to accelerate communication-bound sparse graph-oriented applications such as neural network simulations. We develop specialized MPI routines specifically developed for irregular accelerator-to-accelerator communication of small message traffic. We use the ARM processor for handling the MPI stack while offloading compute-intensive calculations to the FPGA. For graphs with 32M nodes and 32M edges, Zedwulf delivers the highest 94 MTEPS (Million Traversed Edges Per Second)throughput over other x86 multi-threaded platforms in our study by 1.2 -- 1.4×. For this experiment, Zedwulf operates at an efficiency of 0.49 MTEPS/W when using ARM+FPGA which is1.2× better than using ARMv7 CPUs alone, and within 8% of the Intel Core i7-4770k platform. Accepted version 2015-12-22T09:08:52Z 2019-12-06T15:27:30Z 2015-12-22T09:08:52Z 2019-12-06T15:27:30Z 2015 Conference Paper Moorthy, P., & Kapre, N. (2015). Zedwulf: Power-Performance Tradeoffs of a 32-Node Zynq SoC Cluster. 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, 68-75. https://hdl.handle.net/10356/83649 http://hdl.handle.net/10220/39205 10.1109/FCCM.2015.37 en © 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: [http://dx.doi.org/10.1109/FCCM.2015.37]. 8 p. application/pdf
spellingShingle	Computer Science and Engineering Moorthy, Pradeep Kapre, Nachiket Zedwulf: Power-Performance Tradeoffs of a 32-Node Zynq SoC Cluster
title	Zedwulf: Power-Performance Tradeoffs of a 32-Node Zynq SoC Cluster
title_full	Zedwulf: Power-Performance Tradeoffs of a 32-Node Zynq SoC Cluster
title_fullStr	Zedwulf: Power-Performance Tradeoffs of a 32-Node Zynq SoC Cluster
title_full_unstemmed	Zedwulf: Power-Performance Tradeoffs of a 32-Node Zynq SoC Cluster
title_short	Zedwulf: Power-Performance Tradeoffs of a 32-Node Zynq SoC Cluster
title_sort	zedwulf power performance tradeoffs of a 32 node zynq soc cluster
topic	Computer Science and Engineering
url	https://hdl.handle.net/10356/83649 http://hdl.handle.net/10220/39205
work_keys_str_mv	AT moorthypradeep zedwulfpowerperformancetradeoffsofa32nodezynqsoccluster AT kaprenachiket zedwulfpowerperformancetradeoffsofa32nodezynqsoccluster

Zedwulf: Power-Performance Tradeoffs of a 32-Node Zynq SoC Cluster

Similar Items