High performance database systems on coupled CPU-GPU architectures

Database systems have been widely used in a large range of applications to provide users with functions to store, modify and extract information from a huge volume of data. In recent years, with constantly increasing data volumes and the emerging of real-time data analytics workloads such as decisio...

Full description

Bibliographic Details
Main Author:	He, Jiong
Other Authors:	He Bingsheng
Format:	Thesis
Language:	English
Published:	2016
Subjects:	DRNTU::Engineering::Computer science and engineering::Information systems::Database management
Online Access:	http://hdl.handle.net/10356/68570

_version_	1826115371502927872
author	He, Jiong
author2	He Bingsheng
author_facet	He Bingsheng He, Jiong
author_sort	He, Jiong
collection	NTU
description	Database systems have been widely used in a large range of applications to provide users with functions to store, modify and extract information from a huge volume of data. In recent years, with constantly increasing data volumes and the emerging of real-time data analytics workloads such as decision support systems, the demands for high performance query execution are becoming more intensive than ever before. Database community has devoted a lot of efforts to improving database query execution performance from various aspects. Among these efforts, exploitation of emerging hardware has become an effective and efficient approach. Graphics Processing Units (GPUs) are originally designed for graphics workloads. In recent years, programming on GPUs for general-purpose tasks has been significantly simplified with the release of programming interfaces. They are ideal platforms for workloads in database systems with abundant Data Level Parallelism (DLP). Conventional GPU (device) is used as a discrete co-processor connected to the CPU (host) via PCI-e bus. Existing studies have demonstrated GPU query co-processing is an effective means for improving the performance of main memory OLAP (Online Analytical Processing) databases. However, the relatively low bandwidth and high latency of the PCI-e bus are usually the major bottleneck for query co-processing performance. Recently, a novel coupled CPU-GPU architecture has been implemented by multiple vendors. That opens up new opportunities for optimizing query co-processing. In this thesis, we investigate these opportunities on such coupled CPU-GPU architectures and propose to implement hash joins and a complete query processing engine that can fully take advantage of these new hardware characteristics. Specifically, we start with studying the fine-grained co-processing mechanisms on hash joins, one of the most important operators in database systems, with and without partitioning. The co-processing outlines an interesting design space. We extend existing cost models to automatically guide decisions on the design space. Our experimental results show that the fine-grained hash joins can outperform the CPU-only, GPU-only and conventional CPU-GPU co-processing by 53%, 35% and 28%, respectively. However, such fine-grained operator designs still suffer from serious memory stalls because the main memory bandwidth of such coupled CPU-GPU architectures is much lower than that of a discrete GPU. To overcome this obstacle and further apply coupled CPU-GPU architectures in a wider range of areas, we propose a novel in-cache query co-processing paradigm by exploiting the shared cache capability. Specifically, we adapt CPU-assisted prefetching to minimize cache misses in GPU query co-processing and CPU-assisted decompression to improve query execution performance. Furthermore, we develop a cost model guided adaptation mechanism for distributing the workload of prefetching, decompression, and query execution between CPU and GPU. The experimental results show that our in-cache query co-processing with workload distribution adaptation mechanism can improve the query execution performance over the state-of-the-art GPU co-processing by up to 36% and 40% on two AMD APUs, respectively. Though fine-grained hash joins and in-cache query co-processing engine have explored various designs to optimally utilize the strengths of the coupled architectures, they still fail to expose the inherent concurrency in each database query. Both of them use a kernel-based execution approach which executes the GPU kernel one by one and optimize individual kernels for resource utilization and performance improvement. Thus, we further propose a novel GPU-based pipelined query execution engine named GPL for more concurrency and higher device utilization. Different from the existing kernel-based execution, GPL takes advantage of hardware features of new-generation GPUs including concurrent kernel execution and efficient data communication channel between kernels. We use the tiling technique to logically partition the input data into smaller data tiles so that the pipelined query plan can be adapted in a cost-based manner. We have conducted extensive experiments on AMD and NVIDIA GPUs. As the results show, GPL is able to significantly outperform the state-of-the-art kernel-based query processing approaches with improvement up to 50%.
first_indexed	2024-10-01T03:54:22Z
format	Thesis
id	ntu-10356/68570
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T03:54:22Z
publishDate	2016
record_format	dspace
spelling	ntu-10356/685702023-03-04T00:35:26Z High performance database systems on coupled CPU-GPU architectures He, Jiong He Bingsheng School of Computer Engineering Parallel and Distributed Computing Centre DRNTU::Engineering::Computer science and engineering::Information systems::Database management Database systems have been widely used in a large range of applications to provide users with functions to store, modify and extract information from a huge volume of data. In recent years, with constantly increasing data volumes and the emerging of real-time data analytics workloads such as decision support systems, the demands for high performance query execution are becoming more intensive than ever before. Database community has devoted a lot of efforts to improving database query execution performance from various aspects. Among these efforts, exploitation of emerging hardware has become an effective and efficient approach. Graphics Processing Units (GPUs) are originally designed for graphics workloads. In recent years, programming on GPUs for general-purpose tasks has been significantly simplified with the release of programming interfaces. They are ideal platforms for workloads in database systems with abundant Data Level Parallelism (DLP). Conventional GPU (device) is used as a discrete co-processor connected to the CPU (host) via PCI-e bus. Existing studies have demonstrated GPU query co-processing is an effective means for improving the performance of main memory OLAP (Online Analytical Processing) databases. However, the relatively low bandwidth and high latency of the PCI-e bus are usually the major bottleneck for query co-processing performance. Recently, a novel coupled CPU-GPU architecture has been implemented by multiple vendors. That opens up new opportunities for optimizing query co-processing. In this thesis, we investigate these opportunities on such coupled CPU-GPU architectures and propose to implement hash joins and a complete query processing engine that can fully take advantage of these new hardware characteristics. Specifically, we start with studying the fine-grained co-processing mechanisms on hash joins, one of the most important operators in database systems, with and without partitioning. The co-processing outlines an interesting design space. We extend existing cost models to automatically guide decisions on the design space. Our experimental results show that the fine-grained hash joins can outperform the CPU-only, GPU-only and conventional CPU-GPU co-processing by 53%, 35% and 28%, respectively. However, such fine-grained operator designs still suffer from serious memory stalls because the main memory bandwidth of such coupled CPU-GPU architectures is much lower than that of a discrete GPU. To overcome this obstacle and further apply coupled CPU-GPU architectures in a wider range of areas, we propose a novel in-cache query co-processing paradigm by exploiting the shared cache capability. Specifically, we adapt CPU-assisted prefetching to minimize cache misses in GPU query co-processing and CPU-assisted decompression to improve query execution performance. Furthermore, we develop a cost model guided adaptation mechanism for distributing the workload of prefetching, decompression, and query execution between CPU and GPU. The experimental results show that our in-cache query co-processing with workload distribution adaptation mechanism can improve the query execution performance over the state-of-the-art GPU co-processing by up to 36% and 40% on two AMD APUs, respectively. Though fine-grained hash joins and in-cache query co-processing engine have explored various designs to optimally utilize the strengths of the coupled architectures, they still fail to expose the inherent concurrency in each database query. Both of them use a kernel-based execution approach which executes the GPU kernel one by one and optimize individual kernels for resource utilization and performance improvement. Thus, we further propose a novel GPU-based pipelined query execution engine named GPL for more concurrency and higher device utilization. Different from the existing kernel-based execution, GPL takes advantage of hardware features of new-generation GPUs including concurrent kernel execution and efficient data communication channel between kernels. We use the tiling technique to logically partition the input data into smaller data tiles so that the pipelined query plan can be adapted in a cost-based manner. We have conducted extensive experiments on AMD and NVIDIA GPUs. As the results show, GPL is able to significantly outperform the state-of-the-art kernel-based query processing approaches with improvement up to 50%. Doctor of Philosophy (SCE) 2016-05-27T01:52:07Z 2016-05-27T01:52:07Z 2016 Thesis http://hdl.handle.net/10356/68570 en 149 p. application/pdf
spellingShingle	DRNTU::Engineering::Computer science and engineering::Information systems::Database management He, Jiong High performance database systems on coupled CPU-GPU architectures
title	High performance database systems on coupled CPU-GPU architectures
title_full	High performance database systems on coupled CPU-GPU architectures
title_fullStr	High performance database systems on coupled CPU-GPU architectures
title_full_unstemmed	High performance database systems on coupled CPU-GPU architectures
title_short	High performance database systems on coupled CPU-GPU architectures
title_sort	high performance database systems on coupled cpu gpu architectures
topic	DRNTU::Engineering::Computer science and engineering::Information systems::Database management
url	http://hdl.handle.net/10356/68570
work_keys_str_mv	AT hejiong highperformancedatabasesystemsoncoupledcpugpuarchitectures

High performance database systems on coupled CPU-GPU architectures

Similar Items