In-cache query co-processing on coupled CPU-GPU architectures

Recently, there have been some emerging processor designs that the CPU and the GPU (Graphics Processing Unit) are integrated in a single chip and share Last Level Cache (LLC). However, the main memory bandwidth of such coupled CPU-GPU architectures can be much lower than that of a discrete GPU. As a...

Full description

Bibliographic Details
Main Authors:	He, Jiong, Zhang, Shuhao, He, Bingsheng
Other Authors:	School of Computer Engineering
Format:	Journal Article
Language:	English
Published:	2016
Subjects:	Memory architecture Query processing
Online Access:	https://hdl.handle.net/10356/81886 http://hdl.handle.net/10220/39709

_version_	1811683756976635904
author	He, Jiong Zhang, Shuhao He, Bingsheng
author2	School of Computer Engineering
author_facet	School of Computer Engineering He, Jiong Zhang, Shuhao He, Bingsheng
author_sort	He, Jiong
collection	NTU
description	Recently, there have been some emerging processor designs that the CPU and the GPU (Graphics Processing Unit) are integrated in a single chip and share Last Level Cache (LLC). However, the main memory bandwidth of such coupled CPU-GPU architectures can be much lower than that of a discrete GPU. As a result, current GPU query co-processing paradigms can severely suffer from memory stalls. In this paper, we propose a novel in-cache query co-processing paradigm for main memory On-Line Analytical Processing (OLAP) databases on coupled CPU-GPU architectures. Specifically, we adapt CPU-assisted prefetching to minimize cache misses in GPU query co-processing and CPU-assisted decompression to improve query execution performance. Furthermore, we develop a cost model guided adaptation mechanism for distributing the workload of prefetching, decompression, and query execution between CPU and GPU. We implement a system prototype and evaluate it on two recent AMD APUs A8 and A10. The experimental results show that 1) in-cache query co-processing can effectively improve the performance of the state-of-the-art GPU co-processing paradigm by up to 30% and 33% on A8 and A10, respectively, and 2) our workload distribution adaption mechanism can significantly improve the query performance by up to 36% and 40% on A8 and A10, respectively.
first_indexed	2024-10-01T04:17:48Z
format	Journal Article
id	ntu-10356/81886
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T04:17:48Z
publishDate	2016
record_format	dspace
spelling	ntu-10356/818862020-05-28T07:17:58Z In-cache query co-processing on coupled CPU-GPU architectures He, Jiong Zhang, Shuhao He, Bingsheng School of Computer Engineering Memory architecture Query processing Recently, there have been some emerging processor designs that the CPU and the GPU (Graphics Processing Unit) are integrated in a single chip and share Last Level Cache (LLC). However, the main memory bandwidth of such coupled CPU-GPU architectures can be much lower than that of a discrete GPU. As a result, current GPU query co-processing paradigms can severely suffer from memory stalls. In this paper, we propose a novel in-cache query co-processing paradigm for main memory On-Line Analytical Processing (OLAP) databases on coupled CPU-GPU architectures. Specifically, we adapt CPU-assisted prefetching to minimize cache misses in GPU query co-processing and CPU-assisted decompression to improve query execution performance. Furthermore, we develop a cost model guided adaptation mechanism for distributing the workload of prefetching, decompression, and query execution between CPU and GPU. We implement a system prototype and evaluate it on two recent AMD APUs A8 and A10. The experimental results show that 1) in-cache query co-processing can effectively improve the performance of the state-of-the-art GPU co-processing paradigm by up to 30% and 33% on A8 and A10, respectively, and 2) our workload distribution adaption mechanism can significantly improve the query performance by up to 36% and 40% on A8 and A10, respectively. MOE (Min. of Education, S’pore) Published version 2016-01-19T06:56:25Z 2019-12-06T14:42:21Z 2016-01-19T06:56:25Z 2019-12-06T14:42:21Z 2014 Journal Article He, J., Zhang, S., & He, B. (2014). In-cache query co-processing on coupled CPU-GPU architectures. Proceedings of the VLDB Endowment, 8(4), 329-340. doi: 10.14778/2735496.2735497 21508097 https://hdl.handle.net/10356/81886 http://hdl.handle.net/10220/39709 10.14778/2735496.2735497 en Proceedings of the VLDB Endowment © 2014 VLDB Endowment. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/. Obtain permission prior to any use beyond those covered by the license. Contact copyright holder by emailing info@vldb.org. Articles from this volume were invited to present their results at the 41st International Conference on Very Large Data Bases, August 31st - September 4th 2015, Kohala Coast, Hawaii. 12 p. application/pdf
spellingShingle	Memory architecture Query processing He, Jiong Zhang, Shuhao He, Bingsheng In-cache query co-processing on coupled CPU-GPU architectures
title	In-cache query co-processing on coupled CPU-GPU architectures
title_full	In-cache query co-processing on coupled CPU-GPU architectures
title_fullStr	In-cache query co-processing on coupled CPU-GPU architectures
title_full_unstemmed	In-cache query co-processing on coupled CPU-GPU architectures
title_short	In-cache query co-processing on coupled CPU-GPU architectures
title_sort	in cache query co processing on coupled cpu gpu architectures
topic	Memory architecture Query processing
url	https://hdl.handle.net/10356/81886 http://hdl.handle.net/10220/39709
work_keys_str_mv	AT hejiong incachequerycoprocessingoncoupledcpugpuarchitectures AT zhangshuhao incachequerycoprocessingoncoupledcpugpuarchitectures AT hebingsheng incachequerycoprocessingoncoupledcpugpuarchitectures

In-cache query co-processing on coupled CPU-GPU architectures

Similar Items