Materialisation and data partitioning algorithms for distributed RDF systems
Many RDF systems support reasoning with Datalog rules via materialisation, where all conclusions of RDF data and the rules are precomputed and explicitly stored in a preprocessing step. As the amount of RDF data used in applications keeps increasing, processing large datasets often requires distribu...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | English |
Published: |
2021
|
Subjects: |
_version_ | 1817931782239027200 |
---|---|
author | Ajileye, T |
author2 | Motik, B |
author_facet | Motik, B Ajileye, T |
author_sort | Ajileye, T |
collection | OXFORD |
description | Many RDF systems support reasoning with Datalog rules via materialisation, where all conclusions of RDF data and the rules are precomputed and explicitly stored in a preprocessing step. As the amount of RDF data used in applications keeps increasing, processing large datasets often requires distributing the data in a cluster of shared-nothing servers. Whereas numerous distributed query answering techniques are known, distributed materialisation is less well understood. In this paper, we present several techniques that facilitate scalable materialisation in distributed RDF systems. First, we present a new distributed materialisation algorithm that aims to minimise communication and synchronisation in the cluster. Second, we present two new algorithms for partitioning RDF data, both of which aim to produce tightly connected partitions, but without loading complete datasets into memory. We evaluate our materialisation algorithm against two state-of-the- art distributed Datalog systems and show that our technique offers competitive performance, particularly when the rules are complex. Moreover, we analyse in depth the effects of data partitioning on reasoning performance and show that our techniques offer performance comparable or superior to the state of the art min-cut partitioning, but computing the partitions requires considerably fewer resources. |
first_indexed | 2024-03-07T07:13:12Z |
format | Thesis |
id | oxford-uuid:06c3eb4e-8f90-4e9f-a5dd-f448b6843822 |
institution | University of Oxford |
language | English |
last_indexed | 2024-12-09T03:27:29Z |
publishDate | 2021 |
record_format | dspace |
spelling | oxford-uuid:06c3eb4e-8f90-4e9f-a5dd-f448b68438222024-12-01T10:24:01ZMaterialisation and data partitioning algorithms for distributed RDF systemsThesishttp://purl.org/coar/resource_type/c_db06uuid:06c3eb4e-8f90-4e9f-a5dd-f448b6843822Automated ReasoningEnglishHyrax Deposit2021Ajileye, TMotik, BHorrocks, IMany RDF systems support reasoning with Datalog rules via materialisation, where all conclusions of RDF data and the rules are precomputed and explicitly stored in a preprocessing step. As the amount of RDF data used in applications keeps increasing, processing large datasets often requires distributing the data in a cluster of shared-nothing servers. Whereas numerous distributed query answering techniques are known, distributed materialisation is less well understood. In this paper, we present several techniques that facilitate scalable materialisation in distributed RDF systems. First, we present a new distributed materialisation algorithm that aims to minimise communication and synchronisation in the cluster. Second, we present two new algorithms for partitioning RDF data, both of which aim to produce tightly connected partitions, but without loading complete datasets into memory. We evaluate our materialisation algorithm against two state-of-the- art distributed Datalog systems and show that our technique offers competitive performance, particularly when the rules are complex. Moreover, we analyse in depth the effects of data partitioning on reasoning performance and show that our techniques offer performance comparable or superior to the state of the art min-cut partitioning, but computing the partitions requires considerably fewer resources. |
spellingShingle | Automated Reasoning Ajileye, T Materialisation and data partitioning algorithms for distributed RDF systems |
title | Materialisation and data partitioning algorithms for distributed RDF systems |
title_full | Materialisation and data partitioning algorithms for distributed RDF systems |
title_fullStr | Materialisation and data partitioning algorithms for distributed RDF systems |
title_full_unstemmed | Materialisation and data partitioning algorithms for distributed RDF systems |
title_short | Materialisation and data partitioning algorithms for distributed RDF systems |
title_sort | materialisation and data partitioning algorithms for distributed rdf systems |
topic | Automated Reasoning |
work_keys_str_mv | AT ajileyet materialisationanddatapartitioningalgorithmsfordistributedrdfsystems |