In-cache query co-processing on coupled CPU-GPU architectures

Recently, there have been some emerging processor designs that the CPU and the GPU (Graphics Processing Unit) are integrated in a single chip and share Last Level Cache (LLC). However, the main memory bandwidth of such coupled CPU-GPU architectures can be much lower than that of a discrete GPU. As a...

Full description

Bibliographic Details
Main Authors: He, Jiong, Zhang, Shuhao, He, Bingsheng
Other Authors: School of Computer Engineering
Format: Journal Article
Language:English
Published: 2016
Subjects:
Online Access:https://hdl.handle.net/10356/81886
http://hdl.handle.net/10220/39709
_version_ 1811683756976635904
author He, Jiong
Zhang, Shuhao
He, Bingsheng
author2 School of Computer Engineering
author_facet School of Computer Engineering
He, Jiong
Zhang, Shuhao
He, Bingsheng
author_sort He, Jiong
collection NTU
description Recently, there have been some emerging processor designs that the CPU and the GPU (Graphics Processing Unit) are integrated in a single chip and share Last Level Cache (LLC). However, the main memory bandwidth of such coupled CPU-GPU architectures can be much lower than that of a discrete GPU. As a result, current GPU query co-processing paradigms can severely suffer from memory stalls. In this paper, we propose a novel in-cache query co-processing paradigm for main memory On-Line Analytical Processing (OLAP) databases on coupled CPU-GPU architectures. Specifically, we adapt CPU-assisted prefetching to minimize cache misses in GPU query co-processing and CPU-assisted decompression to improve query execution performance. Furthermore, we develop a cost model guided adaptation mechanism for distributing the workload of prefetching, decompression, and query execution between CPU and GPU. We implement a system prototype and evaluate it on two recent AMD APUs A8 and A10. The experimental results show that 1) in-cache query co-processing can effectively improve the performance of the state-of-the-art GPU co-processing paradigm by up to 30% and 33% on A8 and A10, respectively, and 2) our workload distribution adaption mechanism can significantly improve the query performance by up to 36% and 40% on A8 and A10, respectively.
first_indexed 2024-10-01T04:17:48Z
format Journal Article
id ntu-10356/81886
institution Nanyang Technological University
language English
last_indexed 2024-10-01T04:17:48Z
publishDate 2016
record_format dspace
spelling ntu-10356/818862020-05-28T07:17:58Z In-cache query co-processing on coupled CPU-GPU architectures He, Jiong Zhang, Shuhao He, Bingsheng School of Computer Engineering Memory architecture Query processing Recently, there have been some emerging processor designs that the CPU and the GPU (Graphics Processing Unit) are integrated in a single chip and share Last Level Cache (LLC). However, the main memory bandwidth of such coupled CPU-GPU architectures can be much lower than that of a discrete GPU. As a result, current GPU query co-processing paradigms can severely suffer from memory stalls. In this paper, we propose a novel in-cache query co-processing paradigm for main memory On-Line Analytical Processing (OLAP) databases on coupled CPU-GPU architectures. Specifically, we adapt CPU-assisted prefetching to minimize cache misses in GPU query co-processing and CPU-assisted decompression to improve query execution performance. Furthermore, we develop a cost model guided adaptation mechanism for distributing the workload of prefetching, decompression, and query execution between CPU and GPU. We implement a system prototype and evaluate it on two recent AMD APUs A8 and A10. The experimental results show that 1) in-cache query co-processing can effectively improve the performance of the state-of-the-art GPU co-processing paradigm by up to 30% and 33% on A8 and A10, respectively, and 2) our workload distribution adaption mechanism can significantly improve the query performance by up to 36% and 40% on A8 and A10, respectively. MOE (Min. of Education, S’pore) Published version 2016-01-19T06:56:25Z 2019-12-06T14:42:21Z 2016-01-19T06:56:25Z 2019-12-06T14:42:21Z 2014 Journal Article He, J., Zhang, S., & He, B. (2014). In-cache query co-processing on coupled CPU-GPU architectures. Proceedings of the VLDB Endowment, 8(4), 329-340. doi: 10.14778/2735496.2735497 21508097 https://hdl.handle.net/10356/81886 http://hdl.handle.net/10220/39709 10.14778/2735496.2735497 en Proceedings of the VLDB Endowment © 2014 VLDB Endowment. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/. Obtain permission prior to any use beyond those covered by the license. Contact copyright holder by emailing info@vldb.org. Articles from this volume were invited to present their results at the 41st International Conference on Very Large Data Bases, August 31st - September 4th 2015, Kohala Coast, Hawaii. 12 p. application/pdf
spellingShingle Memory architecture
Query processing
He, Jiong
Zhang, Shuhao
He, Bingsheng
In-cache query co-processing on coupled CPU-GPU architectures
title In-cache query co-processing on coupled CPU-GPU architectures
title_full In-cache query co-processing on coupled CPU-GPU architectures
title_fullStr In-cache query co-processing on coupled CPU-GPU architectures
title_full_unstemmed In-cache query co-processing on coupled CPU-GPU architectures
title_short In-cache query co-processing on coupled CPU-GPU architectures
title_sort in cache query co processing on coupled cpu gpu architectures
topic Memory architecture
Query processing
url https://hdl.handle.net/10356/81886
http://hdl.handle.net/10220/39709
work_keys_str_mv AT hejiong incachequerycoprocessingoncoupledcpugpuarchitectures
AT zhangshuhao incachequerycoprocessingoncoupledcpugpuarchitectures
AT hebingsheng incachequerycoprocessingoncoupledcpugpuarchitectures