Effective partitioning of RDF data for distributed query answering

<p>The growing popularity of Resource Description Framework (RDF) as a mode for data exchange and integration has resulted in the increased growth of RDF datasets. Some large scale RDF datasets cannot be stored and processed efficiently on a single node. A common approach to processing large R...

पूर्ण विवरण

ग्रंथसूची विवरण
मुख्य लेखक: Banda, F
अन्य लेखक: Boris, M
स्वरूप: थीसिस
भाषा:English
प्रकाशित: 2021
विषय:
_version_ 1826314434863169536
author Banda, F
author2 Boris, M
author_facet Boris, M
Banda, F
author_sort Banda, F
collection OXFORD
description <p>The growing popularity of Resource Description Framework (RDF) as a mode for data exchange and integration has resulted in the increased growth of RDF datasets. Some large scale RDF datasets cannot be stored and processed efficiently on a single node. A common approach to processing large RDF datasets is to partition the data in a cluster of shared-nothing servers and use a distributed query evaluation algorithm. It is commonly assumed in the literature that the performance of query processing in such systems is limited mainly by network communication. In this thesis, we show that this assumption does not always hold and we argue that more important than minimizing network communication, we should prioritise even workload distribution among servers when partitioning. Moreover, we present a new RDF partitioning method based on Louvain community detection, which drastically reduces communication, but without a corresponding decrease in query running times. This is because strongly connected partitions can lead to workload imbalance among the servers. We present a further refinement of our technique that aims to strike a balance between reducing communication and spreading processing more evenly, and our empirical evaluation shows that such an approach can improve load balance and hence reduce both communication and query times.</p>
first_indexed 2024-03-07T07:21:48Z
format Thesis
id oxford-uuid:c5e14471-bc7c-416d-a715-55fd5c36af7d
institution University of Oxford
language English
last_indexed 2024-09-25T04:32:28Z
publishDate 2021
record_format dspace
spelling oxford-uuid:c5e14471-bc7c-416d-a715-55fd5c36af7d2024-09-02T08:30:18ZEffective partitioning of RDF data for distributed query answeringThesishttp://purl.org/coar/resource_type/c_db06uuid:c5e14471-bc7c-416d-a715-55fd5c36af7dGraph theoryKnowledge Representation and ReasoningEnglishHyrax Deposit2021Banda, FBoris, M<p>The growing popularity of Resource Description Framework (RDF) as a mode for data exchange and integration has resulted in the increased growth of RDF datasets. Some large scale RDF datasets cannot be stored and processed efficiently on a single node. A common approach to processing large RDF datasets is to partition the data in a cluster of shared-nothing servers and use a distributed query evaluation algorithm. It is commonly assumed in the literature that the performance of query processing in such systems is limited mainly by network communication. In this thesis, we show that this assumption does not always hold and we argue that more important than minimizing network communication, we should prioritise even workload distribution among servers when partitioning. Moreover, we present a new RDF partitioning method based on Louvain community detection, which drastically reduces communication, but without a corresponding decrease in query running times. This is because strongly connected partitions can lead to workload imbalance among the servers. We present a further refinement of our technique that aims to strike a balance between reducing communication and spreading processing more evenly, and our empirical evaluation shows that such an approach can improve load balance and hence reduce both communication and query times.</p>
spellingShingle Graph theory
Knowledge Representation and Reasoning
Banda, F
Effective partitioning of RDF data for distributed query answering
title Effective partitioning of RDF data for distributed query answering
title_full Effective partitioning of RDF data for distributed query answering
title_fullStr Effective partitioning of RDF data for distributed query answering
title_full_unstemmed Effective partitioning of RDF data for distributed query answering
title_short Effective partitioning of RDF data for distributed query answering
title_sort effective partitioning of rdf data for distributed query answering
topic Graph theory
Knowledge Representation and Reasoning
work_keys_str_mv AT bandaf effectivepartitioningofrdfdatafordistributedqueryanswering