Materialisation and data partitioning algorithms for distributed RDF systems

Many RDF systems support reasoning with Datalog rules via materialisation, where all conclusions of RDF data and the rules are precomputed and explicitly stored in a preprocessing step. As the amount of RDF data used in applications keeps increasing, processing large datasets often requires distribu...

Full description

Bibliographic Details
Main Author: Ajileye, T
Other Authors: Motik, B
Format: Thesis
Language:English
Published: 2021
Subjects:
_version_ 1817931782239027200
author Ajileye, T
author2 Motik, B
author_facet Motik, B
Ajileye, T
author_sort Ajileye, T
collection OXFORD
description Many RDF systems support reasoning with Datalog rules via materialisation, where all conclusions of RDF data and the rules are precomputed and explicitly stored in a preprocessing step. As the amount of RDF data used in applications keeps increasing, processing large datasets often requires distributing the data in a cluster of shared-nothing servers. Whereas numerous distributed query answering techniques are known, distributed materialisation is less well understood. In this paper, we present several techniques that facilitate scalable materialisation in distributed RDF systems. First, we present a new distributed materialisation algorithm that aims to minimise communication and synchronisation in the cluster. Second, we present two new algorithms for partitioning RDF data, both of which aim to produce tightly connected partitions, but without loading complete datasets into memory. We evaluate our materialisation algorithm against two state-of-the- art distributed Datalog systems and show that our technique offers competitive performance, particularly when the rules are complex. Moreover, we analyse in depth the effects of data partitioning on reasoning performance and show that our techniques offer performance comparable or superior to the state of the art min-cut partitioning, but computing the partitions requires considerably fewer resources.
first_indexed 2024-03-07T07:13:12Z
format Thesis
id oxford-uuid:06c3eb4e-8f90-4e9f-a5dd-f448b6843822
institution University of Oxford
language English
last_indexed 2024-12-09T03:27:29Z
publishDate 2021
record_format dspace
spelling oxford-uuid:06c3eb4e-8f90-4e9f-a5dd-f448b68438222024-12-01T10:24:01ZMaterialisation and data partitioning algorithms for distributed RDF systemsThesishttp://purl.org/coar/resource_type/c_db06uuid:06c3eb4e-8f90-4e9f-a5dd-f448b6843822Automated ReasoningEnglishHyrax Deposit2021Ajileye, TMotik, BHorrocks, IMany RDF systems support reasoning with Datalog rules via materialisation, where all conclusions of RDF data and the rules are precomputed and explicitly stored in a preprocessing step. As the amount of RDF data used in applications keeps increasing, processing large datasets often requires distributing the data in a cluster of shared-nothing servers. Whereas numerous distributed query answering techniques are known, distributed materialisation is less well understood. In this paper, we present several techniques that facilitate scalable materialisation in distributed RDF systems. First, we present a new distributed materialisation algorithm that aims to minimise communication and synchronisation in the cluster. Second, we present two new algorithms for partitioning RDF data, both of which aim to produce tightly connected partitions, but without loading complete datasets into memory. We evaluate our materialisation algorithm against two state-of-the- art distributed Datalog systems and show that our technique offers competitive performance, particularly when the rules are complex. Moreover, we analyse in depth the effects of data partitioning on reasoning performance and show that our techniques offer performance comparable or superior to the state of the art min-cut partitioning, but computing the partitions requires considerably fewer resources.
spellingShingle Automated Reasoning
Ajileye, T
Materialisation and data partitioning algorithms for distributed RDF systems
title Materialisation and data partitioning algorithms for distributed RDF systems
title_full Materialisation and data partitioning algorithms for distributed RDF systems
title_fullStr Materialisation and data partitioning algorithms for distributed RDF systems
title_full_unstemmed Materialisation and data partitioning algorithms for distributed RDF systems
title_short Materialisation and data partitioning algorithms for distributed RDF systems
title_sort materialisation and data partitioning algorithms for distributed rdf systems
topic Automated Reasoning
work_keys_str_mv AT ajileyet materialisationanddatapartitioningalgorithmsfordistributedrdfsystems