Materialisation and data partitioning algorithms for distributed RDF systems

Many RDF systems support reasoning with Datalog rules via materialisation, where all conclusions of RDF data and the rules are precomputed and explicitly stored in a preprocessing step. As the amount of RDF data used in applications keeps increasing, processing large datasets often requires distribu...

Full description

Bibliographic Details
Main Authors: Ajileye, T, Motik, B
Format: Journal article
Language:English
Published: Elsevier 2022
_version_ 1797107089409048576
author Ajileye, T
Motik, B
author_facet Ajileye, T
Motik, B
author_sort Ajileye, T
collection OXFORD
description Many RDF systems support reasoning with Datalog rules via materialisation, where all conclusions of RDF data and the rules are precomputed and explicitly stored in a preprocessing step. As the amount of RDF data used in applications keeps increasing, processing large datasets often requires distributing the data in a cluster of shared-nothing servers. While numerous distributed query answering techniques are known, distributed materialisation is less well understood. In this paper, we present several techniques that facilitate scalable materialisation in distributed RDF systems. First, we present a new distributed materialisation algorithm that aims to minimise communication and synchronisation in the cluster. Second, we present two new algorithms for partitioning RDF data, both of which aim to produce tightly connected partitions, but without loading complete datasets into memory. We evaluate our materialisation algorithm against two state-of-the-art distributed Datalog systems and show that our technique offers competitive performance, particularly when the rules are complex. Moreover, we analyse in depth the effects of data partitioning on reasoning performance and show that our techniques offer performance comparable or superior to the state of the art min-cut partitioning, but computing the partitions requires considerably less time and memory.
first_indexed 2024-03-07T07:11:30Z
format Journal article
id oxford-uuid:e495a71f-84f7-40aa-b2d9-028820eb7cd9
institution University of Oxford
language English
last_indexed 2024-03-07T07:11:30Z
publishDate 2022
publisher Elsevier
record_format dspace
spelling oxford-uuid:e495a71f-84f7-40aa-b2d9-028820eb7cd92022-06-24T09:20:24ZMaterialisation and data partitioning algorithms for distributed RDF systemsJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:e495a71f-84f7-40aa-b2d9-028820eb7cd9EnglishSymplectic ElementsElsevier2022Ajileye, TMotik, BMany RDF systems support reasoning with Datalog rules via materialisation, where all conclusions of RDF data and the rules are precomputed and explicitly stored in a preprocessing step. As the amount of RDF data used in applications keeps increasing, processing large datasets often requires distributing the data in a cluster of shared-nothing servers. While numerous distributed query answering techniques are known, distributed materialisation is less well understood. In this paper, we present several techniques that facilitate scalable materialisation in distributed RDF systems. First, we present a new distributed materialisation algorithm that aims to minimise communication and synchronisation in the cluster. Second, we present two new algorithms for partitioning RDF data, both of which aim to produce tightly connected partitions, but without loading complete datasets into memory. We evaluate our materialisation algorithm against two state-of-the-art distributed Datalog systems and show that our technique offers competitive performance, particularly when the rules are complex. Moreover, we analyse in depth the effects of data partitioning on reasoning performance and show that our techniques offer performance comparable or superior to the state of the art min-cut partitioning, but computing the partitions requires considerably less time and memory.
spellingShingle Ajileye, T
Motik, B
Materialisation and data partitioning algorithms for distributed RDF systems
title Materialisation and data partitioning algorithms for distributed RDF systems
title_full Materialisation and data partitioning algorithms for distributed RDF systems
title_fullStr Materialisation and data partitioning algorithms for distributed RDF systems
title_full_unstemmed Materialisation and data partitioning algorithms for distributed RDF systems
title_short Materialisation and data partitioning algorithms for distributed RDF systems
title_sort materialisation and data partitioning algorithms for distributed rdf systems
work_keys_str_mv AT ajileyet materialisationanddatapartitioningalgorithmsfordistributedrdfsystems
AT motikb materialisationanddatapartitioningalgorithmsfordistributedrdfsystems