UPI: A Primary Index for Uncertain Databases

Uncertain data management has received growing attention from industry and academia. Many efforts have been made to optimize uncertain databases, including the development of special index data structures. However, none of these efforts have explored primary (clustered) indexes for uncertain databas...

Full description

Bibliographic Details
Main Authors:	Kimura, Hideaki, Madden, Samuel R., Zdonik, Stanley B.
Other Authors:	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format:	Article
Language:	en_US
Published:	Very Large Data Base Endowment Inc. (VLDB Endowment) 2012
Online Access:	http://hdl.handle.net/1721.1/73422 https://orcid.org/0000-0002-7470-3265

_version_	1826218145886502912
author	Kimura, Hideaki Madden, Samuel R. Zdonik, Stanley B.
author2	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Kimura, Hideaki Madden, Samuel R. Zdonik, Stanley B.
author_sort	Kimura, Hideaki
collection	MIT
description	Uncertain data management has received growing attention from industry and academia. Many efforts have been made to optimize uncertain databases, including the development of special index data structures. However, none of these efforts have explored primary (clustered) indexes for uncertain databases, despite the fact that clustering has the potential to offer substantial speedups for non-selective analytic queries on large uncertain databases. In this paper, we propose a new index called a UPI (Uncertain Primary Index) that clusters heap files according to uncertain attributes with both discrete and continuous uncertainty distributions. Because uncertain attributes may have several possible values, a UPI on an uncertain attribute duplicates tuple data once for each possible value. To prevent the size of the UPI from becoming unmanageable, its size is kept small by placing low-probability tuples in a special Cutoff Index that is consulted only when queries for low-probability values are run. We also propose several other optimizations, including techniques to improve secondary index performance and techniques to reduce maintenance costs and fragmentation by buffering changes to the table and writing updates in sequential batches. Finally, we develop cost models for UPIs to estimate query performance in various settings to help automatically select tuning parameters of a UPI. We have implemented a prototype UPI and experimented on two real datasets. Our results show that UPIs can significantly (up to two orders of magnitude) improve the performance of uncertain queries both over clustered and unclustered attributes. We also show that our buffering techniques mitigate table fragmentation and keep the maintenance cost as low as or even lower than using an unclustered heap file.
first_indexed	2024-09-23T17:14:56Z
format	Article
id	mit-1721.1/73422
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T17:14:56Z
publishDate	2012
publisher	Very Large Data Base Endowment Inc. (VLDB Endowment)
record_format	dspace
spelling	mit-1721.1/734222022-10-03T11:21:54Z UPI: A Primary Index for Uncertain Databases Kimura, Hideaki Madden, Samuel R. Zdonik, Stanley B. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Madden, Samuel R. Zdonik, Stanley B. Uncertain data management has received growing attention from industry and academia. Many efforts have been made to optimize uncertain databases, including the development of special index data structures. However, none of these efforts have explored primary (clustered) indexes for uncertain databases, despite the fact that clustering has the potential to offer substantial speedups for non-selective analytic queries on large uncertain databases. In this paper, we propose a new index called a UPI (Uncertain Primary Index) that clusters heap files according to uncertain attributes with both discrete and continuous uncertainty distributions. Because uncertain attributes may have several possible values, a UPI on an uncertain attribute duplicates tuple data once for each possible value. To prevent the size of the UPI from becoming unmanageable, its size is kept small by placing low-probability tuples in a special Cutoff Index that is consulted only when queries for low-probability values are run. We also propose several other optimizations, including techniques to improve secondary index performance and techniques to reduce maintenance costs and fragmentation by buffering changes to the table and writing updates in sequential batches. Finally, we develop cost models for UPIs to estimate query performance in various settings to help automatically select tuning parameters of a UPI. We have implemented a prototype UPI and experimented on two real datasets. Our results show that UPIs can significantly (up to two orders of magnitude) improve the performance of uncertain queries both over clustered and unclustered attributes. We also show that our buffering techniques mitigate table fragmentation and keep the maintenance cost as low as or even lower than using an unclustered heap file. National Science Foundation (U.S.) (Grant IIS-0448124) National Science Foundation (U.S.) (Grant IIS-0905553) National Science Foundation (U.S.) (Grant IIS-0916691) 2012-09-27T15:52:19Z 2012-09-27T15:52:19Z 2010-09 Article http://purl.org/eprint/type/ConferencePaper 2150-8097 http://hdl.handle.net/1721.1/73422 Hideaki Kimura, Samuel Madden, and Stanley B. Zdonik. 2010. UPI: a primary index for uncertain databases. Proc. VLDB Endow. 3, 1-2 (September 2010), 630-637. https://orcid.org/0000-0002-7470-3265 en_US http://dl.acm.org/citation.cfm?id=1920922 Proceedings of the VLDB Endowment Creative Commons Attribution-Noncommercial-Share Alike 3.0 http://creativecommons.org/licenses/by-nc-sa/3.0/ application/pdf Very Large Data Base Endowment Inc. (VLDB Endowment) MIT web domain
spellingShingle	Kimura, Hideaki Madden, Samuel R. Zdonik, Stanley B. UPI: A Primary Index for Uncertain Databases
title	UPI: A Primary Index for Uncertain Databases
title_full	UPI: A Primary Index for Uncertain Databases
title_fullStr	UPI: A Primary Index for Uncertain Databases
title_full_unstemmed	UPI: A Primary Index for Uncertain Databases
title_short	UPI: A Primary Index for Uncertain Databases
title_sort	upi a primary index for uncertain databases
url	http://hdl.handle.net/1721.1/73422 https://orcid.org/0000-0002-7470-3265
work_keys_str_mv	AT kimurahideaki upiaprimaryindexforuncertaindatabases AT maddensamuelr upiaprimaryindexforuncertaindatabases AT zdonikstanleyb upiaprimaryindexforuncertaindatabases

UPI: A Primary Index for Uncertain Databases

Similar Items