Correlation maps: A compressed access method for exploiting soft functional dependencies

In relational query processing, there are generally two choices for access paths when performing a predicate lookup for which no clustered index is available. One option is to use an unclustered index. Another is to perform a complete sequential scan of the table. Many analytical workloads do not be...

Full description

Bibliographic Details
Main Authors:	Kimura, Hideaki, Huo, George, Rasin, Alexander, Zdonik, Stanley B., Madden, Samuel R.
Other Authors:	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format:	Article
Language:	en_US
Published:	Association for Computing Machinery (ACM) 2014
Online Access:	http://hdl.handle.net/1721.1/90382 https://orcid.org/0000-0002-7470-3265

_version_	1826213083488452608
author	Kimura, Hideaki Huo, George Rasin, Alexander Zdonik, Stanley B. Madden, Samuel R.
author2	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Kimura, Hideaki Huo, George Rasin, Alexander Zdonik, Stanley B. Madden, Samuel R.
author_sort	Kimura, Hideaki
collection	MIT
description	In relational query processing, there are generally two choices for access paths when performing a predicate lookup for which no clustered index is available. One option is to use an unclustered index. Another is to perform a complete sequential scan of the table. Many analytical workloads do not benefit from the availability of unclustered indexes; the cost of random disk I/O becomes prohibitive for all but the most selective queries. It has been observed that a secondary index on an unclustered attribute can perform well under certain conditions if the unclustered attribute is correlated with a clustered index attribute [4]. The clustered index will co-locate values and the correlation will localize access through the unclustered attribute to a subset of the pages. In this paper, we show that in a real application (SDSS) and widely used benchmark (TPC-H), there exist many cases of attribute correlation that can be exploited to accelerate queries. We also discuss a tool that can automatically suggest useful pairs of correlated attributes. It does so using an analytical cost model that we developed, which is novel in its awareness of the effects of clustering and correlation. Furthermore, we propose a data structure called a Correlation Map (CM) that expresses the mapping between the correlated attributes, acting much like a secondary index. The paper also discusses how bucketing on the domains of both attributes in the correlated attribute pair can dramatically reduce the size of the CM to be potentially orders of magnitude smaller than that of a secondary B+Tree index. This reduction in size allows us to create a large number of CMs that improve performance for a wide range of queries. The small size also reduces maintenance costs as we demonstrate experimentally.
first_indexed	2024-09-23T15:43:04Z
format	Article
id	mit-1721.1/90382
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T15:43:04Z
publishDate	2014
publisher	Association for Computing Machinery (ACM)
record_format	dspace
spelling	mit-1721.1/903822022-09-29T15:40:42Z Correlation maps: A compressed access method for exploiting soft functional dependencies Kimura, Hideaki Huo, George Rasin, Alexander Zdonik, Stanley B. Madden, Samuel R. Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Madden, Samuel R. In relational query processing, there are generally two choices for access paths when performing a predicate lookup for which no clustered index is available. One option is to use an unclustered index. Another is to perform a complete sequential scan of the table. Many analytical workloads do not benefit from the availability of unclustered indexes; the cost of random disk I/O becomes prohibitive for all but the most selective queries. It has been observed that a secondary index on an unclustered attribute can perform well under certain conditions if the unclustered attribute is correlated with a clustered index attribute [4]. The clustered index will co-locate values and the correlation will localize access through the unclustered attribute to a subset of the pages. In this paper, we show that in a real application (SDSS) and widely used benchmark (TPC-H), there exist many cases of attribute correlation that can be exploited to accelerate queries. We also discuss a tool that can automatically suggest useful pairs of correlated attributes. It does so using an analytical cost model that we developed, which is novel in its awareness of the effects of clustering and correlation. Furthermore, we propose a data structure called a Correlation Map (CM) that expresses the mapping between the correlated attributes, acting much like a secondary index. The paper also discusses how bucketing on the domains of both attributes in the correlated attribute pair can dramatically reduce the size of the CM to be potentially orders of magnitude smaller than that of a secondary B+Tree index. This reduction in size allows us to create a large number of CMs that improve performance for a wide range of queries. The small size also reduces maintenance costs as we demonstrate experimentally. 2014-09-26T13:17:21Z 2014-09-26T13:17:21Z 2009-08 Article http://purl.org/eprint/type/ConferencePaper 21508097 http://hdl.handle.net/1721.1/90382 Hideaki Kimura, George Huo, Alexander Rasin, Samuel Madden, and Stanley B. Zdonik. 2009. Correlation maps: a compressed access method for exploiting soft functional dependencies. Proc. VLDB Endow. 2, 1 (August 2009), 1222-1233. https://orcid.org/0000-0002-7470-3265 en_US http://dx.doi.org/10.14778/1687627.1687765 Proceedings of the VLDB Endowment Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Association for Computing Machinery (ACM) Other repository
spellingShingle	Kimura, Hideaki Huo, George Rasin, Alexander Zdonik, Stanley B. Madden, Samuel R. Correlation maps: A compressed access method for exploiting soft functional dependencies
title	Correlation maps: A compressed access method for exploiting soft functional dependencies
title_full	Correlation maps: A compressed access method for exploiting soft functional dependencies
title_fullStr	Correlation maps: A compressed access method for exploiting soft functional dependencies
title_full_unstemmed	Correlation maps: A compressed access method for exploiting soft functional dependencies
title_short	Correlation maps: A compressed access method for exploiting soft functional dependencies
title_sort	correlation maps a compressed access method for exploiting soft functional dependencies
url	http://hdl.handle.net/1721.1/90382 https://orcid.org/0000-0002-7470-3265
work_keys_str_mv	AT kimurahideaki correlationmapsacompressedaccessmethodforexploitingsoftfunctionaldependencies AT huogeorge correlationmapsacompressedaccessmethodforexploitingsoftfunctionaldependencies AT rasinalexander correlationmapsacompressedaccessmethodforexploitingsoftfunctionaldependencies AT zdonikstanleyb correlationmapsacompressedaccessmethodforexploitingsoftfunctionaldependencies AT maddensamuelr correlationmapsacompressedaccessmethodforexploitingsoftfunctionaldependencies

Correlation maps: A compressed access method for exploiting soft functional dependencies

Similar Items