Skyline queries on data with uncertain dimensions for efficient computation
The notion of skyline query is to find a set of objects that is not dominated by any other objects. Skyline query is crucial in multi-criteria decision making applications particularly in applications that generate uncertain data. Although there is a significant amount of research that has been c...
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2018
|
Subjects: | |
Online Access: | http://psasir.upm.edu.my/id/eprint/76991/1/FSKTM%202018%2072%20-%20IR.pdf |
_version_ | 1825950766324514816 |
---|---|
author | Mohd Saad, Nurul Husna |
author_facet | Mohd Saad, Nurul Husna |
author_sort | Mohd Saad, Nurul Husna |
collection | UPM |
description | The notion of skyline query is to find a set of objects that is not dominated by
any other objects. Skyline query is crucial in multi-criteria decision making
applications particularly in applications that generate uncertain data. Although
there is a significant amount of research that has been committed for efficient
skyline computation, regrettably, existing works lack on how to conduct skyline
queries on uncertain data with objects represented as continuous ranges and
exact values. By having data with uncertain dimensions, the dominance
relation among objects with continuous ranges and exact values may not be
transitive, thus, causing existing techniques for skyline queries are not
applicable. The results of skyline queries are bound to be probabilistic since
each object with continuous range is now associated with a probability value of
it being a query answer. Furthermore, querying information within a range of
search on uncertain dimensions proves to be challenging in order to determine
objects with continuous ranges that satisfy the range query. Hence, this thesis
focuses on efficiently extending skyline query and range skyline query
processing to support data with uncertain dimensions. We define skyline
queries over data with uncertain dimensions and present four methods to
efficiently answer skyline queries, namely: distinctive partitioning, exact
domination, range domination, and uncertain domination. We propose a twophase
framework, SkyQUD, which integrates these four methods; the first
phase employs efficient probability computations which are performed
individually on groups of objects with exact values and continuous ranges,
respectively. Meanwhile, the second phase employs more complex and
expensive computations to perform dominance testing between objects from
different groups. The SkyQUD framework is responsible to extract the most
dominant skyline objects that meet the required threshold value. The threshold
value is utilized in order to manage the quality and the size of the skyline
objects reported. Next, we extend SkyQUD to support skyline with range
queries on uncertain dimensions, denoted as SkyQUD-T. A method, range
pruning, is proposed and incorporated before the first phase in SkyQUD to
determine objects that satisfy the range query, where it bounds the probability of each object to a certain threshold value. Both frameworks have been
validated through extensive experiments employing real and synthetic
datasets. Several independent variables which are scalability, threshold, data
distributions, and dimensionality are selected to determine their effects on two
dependent variables. The effect of manipulating the independent variables is
studied on the dependent variables which are number of pairwise comparisons
and processing time. Through theoretical analysis and extensive experiments,
we show that SkyQUD is able to effectively support skyline queries on data
with uncertain dimensions and capable of handling large datasets. The
performance of SkyQUD-T is studied against two naïve algorithms that are
developed to reflect the best-case and worst-case scenarios. Results exhibit
the evidences of the behaviour of SkyQUD-T, where the number of pairwise
comparisons performed in SkyQUD-T is always within the performance of the
aforementioned naïve algorithms. |
first_indexed | 2024-03-06T10:19:38Z |
format | Thesis |
id | upm.eprints-76991 |
institution | Universiti Putra Malaysia |
language | English |
last_indexed | 2024-03-06T10:19:38Z |
publishDate | 2018 |
record_format | dspace |
spelling | upm.eprints-769912020-02-11T02:08:38Z http://psasir.upm.edu.my/id/eprint/76991/ Skyline queries on data with uncertain dimensions for efficient computation Mohd Saad, Nurul Husna The notion of skyline query is to find a set of objects that is not dominated by any other objects. Skyline query is crucial in multi-criteria decision making applications particularly in applications that generate uncertain data. Although there is a significant amount of research that has been committed for efficient skyline computation, regrettably, existing works lack on how to conduct skyline queries on uncertain data with objects represented as continuous ranges and exact values. By having data with uncertain dimensions, the dominance relation among objects with continuous ranges and exact values may not be transitive, thus, causing existing techniques for skyline queries are not applicable. The results of skyline queries are bound to be probabilistic since each object with continuous range is now associated with a probability value of it being a query answer. Furthermore, querying information within a range of search on uncertain dimensions proves to be challenging in order to determine objects with continuous ranges that satisfy the range query. Hence, this thesis focuses on efficiently extending skyline query and range skyline query processing to support data with uncertain dimensions. We define skyline queries over data with uncertain dimensions and present four methods to efficiently answer skyline queries, namely: distinctive partitioning, exact domination, range domination, and uncertain domination. We propose a twophase framework, SkyQUD, which integrates these four methods; the first phase employs efficient probability computations which are performed individually on groups of objects with exact values and continuous ranges, respectively. Meanwhile, the second phase employs more complex and expensive computations to perform dominance testing between objects from different groups. The SkyQUD framework is responsible to extract the most dominant skyline objects that meet the required threshold value. The threshold value is utilized in order to manage the quality and the size of the skyline objects reported. Next, we extend SkyQUD to support skyline with range queries on uncertain dimensions, denoted as SkyQUD-T. A method, range pruning, is proposed and incorporated before the first phase in SkyQUD to determine objects that satisfy the range query, where it bounds the probability of each object to a certain threshold value. Both frameworks have been validated through extensive experiments employing real and synthetic datasets. Several independent variables which are scalability, threshold, data distributions, and dimensionality are selected to determine their effects on two dependent variables. The effect of manipulating the independent variables is studied on the dependent variables which are number of pairwise comparisons and processing time. Through theoretical analysis and extensive experiments, we show that SkyQUD is able to effectively support skyline queries on data with uncertain dimensions and capable of handling large datasets. The performance of SkyQUD-T is studied against two naïve algorithms that are developed to reflect the best-case and worst-case scenarios. Results exhibit the evidences of the behaviour of SkyQUD-T, where the number of pairwise comparisons performed in SkyQUD-T is always within the performance of the aforementioned naïve algorithms. 2018-07 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/76991/1/FSKTM%202018%2072%20-%20IR.pdf Mohd Saad, Nurul Husna (2018) Skyline queries on data with uncertain dimensions for efficient computation. Doctoral thesis, Universiti Putra Malaysia. Database management Querying (Computer science) Data mining |
spellingShingle | Database management Querying (Computer science) Data mining Mohd Saad, Nurul Husna Skyline queries on data with uncertain dimensions for efficient computation |
title | Skyline queries on data with uncertain dimensions for efficient computation |
title_full | Skyline queries on data with uncertain dimensions for efficient computation |
title_fullStr | Skyline queries on data with uncertain dimensions for efficient computation |
title_full_unstemmed | Skyline queries on data with uncertain dimensions for efficient computation |
title_short | Skyline queries on data with uncertain dimensions for efficient computation |
title_sort | skyline queries on data with uncertain dimensions for efficient computation |
topic | Database management Querying (Computer science) Data mining |
url | http://psasir.upm.edu.my/id/eprint/76991/1/FSKTM%202018%2072%20-%20IR.pdf |
work_keys_str_mv | AT mohdsaadnurulhusna skylinequeriesondatawithuncertaindimensionsforefficientcomputation |