Computing Big-Data Applications Near Flash

Current systems produce a large and growing amount of data, which is often referred to as Big Data. Providing valuable insights from this data requires new computing systems to store and process it efficiently. For a fast response time, Big Data typically relies on in-memory computing, which require...

Full description

Bibliographic Details
Main Author: Xu, Shuotao
Other Authors: Arvind
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/139492
http://orcid.org/0000-0003-3158-3731
_version_ 1811087346933694464
author Xu, Shuotao
author2 Arvind
author_facet Arvind
Xu, Shuotao
author_sort Xu, Shuotao
collection MIT
description Current systems produce a large and growing amount of data, which is often referred to as Big Data. Providing valuable insights from this data requires new computing systems to store and process it efficiently. For a fast response time, Big Data typically relies on in-memory computing, which requires a cluster of machines with enough aggregate DRAM to accommodate the entire datasets for the duration of the computation. Big Data typically exceeds several terabytes, therefore this approach can incur significant overhead in power, space and equipment. If the amount of DRAM is not sufficient to hold the working-set of a query, the performance deteriorates catastrophically. Although NAND flash can provide high-bandwidth data access and has higher capacity density and lower cost per bit than DRAM, flash storage has dramatically different characteristics than DRAM, such as large access granularity and longer access latency. Therefore, there are many challenges for Big-Data applications to enable flash-centric computing and achieve performance comparable to that of in-memory computing. This thesis presents flash-centric hardware architectures that provide high processing throughput for data-intensive applications while hiding long flash access latency. Specifically we describe two novel flash-centric hardware accelerators, BlueCache and AQUOMAN. These systems lower the cost of two common data-center workloads, key-value cache and SQL analytics. We have built BlueCache and AQUOMAN using FPGAs and flash storage, and show that they can provide competitive performance of computing Big-Data applications with multi-terabyte datasets. BlueCache provides a 10-100X cheaper key-value cache than DRAM-based solution, and can outperform DRAM-based system when the latter has more than 7.4% misses for a read-intensive workloads. A desktop-class machine with single instance of 1TB AQUOMAN disk can achieve performance similar to that of a dual-socket general-purpose server with off-the-shelf SSDs. We believe BlueCache and AQUOMAN can bring down the cost of acquiring and operating high-performance computing systems for data-center-scale Big-Data applications dramatically.
first_indexed 2024-09-23T13:44:38Z
format Thesis
id mit-1721.1/139492
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T13:44:38Z
publishDate 2022
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1394922022-01-15T03:25:29Z Computing Big-Data Applications Near Flash Xu, Shuotao Arvind Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Current systems produce a large and growing amount of data, which is often referred to as Big Data. Providing valuable insights from this data requires new computing systems to store and process it efficiently. For a fast response time, Big Data typically relies on in-memory computing, which requires a cluster of machines with enough aggregate DRAM to accommodate the entire datasets for the duration of the computation. Big Data typically exceeds several terabytes, therefore this approach can incur significant overhead in power, space and equipment. If the amount of DRAM is not sufficient to hold the working-set of a query, the performance deteriorates catastrophically. Although NAND flash can provide high-bandwidth data access and has higher capacity density and lower cost per bit than DRAM, flash storage has dramatically different characteristics than DRAM, such as large access granularity and longer access latency. Therefore, there are many challenges for Big-Data applications to enable flash-centric computing and achieve performance comparable to that of in-memory computing. This thesis presents flash-centric hardware architectures that provide high processing throughput for data-intensive applications while hiding long flash access latency. Specifically we describe two novel flash-centric hardware accelerators, BlueCache and AQUOMAN. These systems lower the cost of two common data-center workloads, key-value cache and SQL analytics. We have built BlueCache and AQUOMAN using FPGAs and flash storage, and show that they can provide competitive performance of computing Big-Data applications with multi-terabyte datasets. BlueCache provides a 10-100X cheaper key-value cache than DRAM-based solution, and can outperform DRAM-based system when the latter has more than 7.4% misses for a read-intensive workloads. A desktop-class machine with single instance of 1TB AQUOMAN disk can achieve performance similar to that of a dual-socket general-purpose server with off-the-shelf SSDs. We believe BlueCache and AQUOMAN can bring down the cost of acquiring and operating high-performance computing systems for data-center-scale Big-Data applications dramatically. Ph.D. 2022-01-14T15:15:16Z 2022-01-14T15:15:16Z 2021-06 2021-06-23T19:41:13.952Z Thesis https://hdl.handle.net/1721.1/139492 http://orcid.org/0000-0003-3158-3731 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Xu, Shuotao
Computing Big-Data Applications Near Flash
title Computing Big-Data Applications Near Flash
title_full Computing Big-Data Applications Near Flash
title_fullStr Computing Big-Data Applications Near Flash
title_full_unstemmed Computing Big-Data Applications Near Flash
title_short Computing Big-Data Applications Near Flash
title_sort computing big data applications near flash
url https://hdl.handle.net/1721.1/139492
http://orcid.org/0000-0003-3158-3731
work_keys_str_mv AT xushuotao computingbigdataapplicationsnearflash