Summary: | <p>Answering queries over large datasets extended with Datalog rules plays a key role in numerous data management applications, and it has been implemented in several highly optimised Datalog systems in both academic and commercial contexts. Many systems implement reasoning via materialisation, which involves precomputing all consequences of the rules and the dataset in a preprocessing step. Some systems also use incremental reasoning algorithms, which can update the materialisation efficiently when the input dataset changes. Such techniques allow queries to be processed without any reference to the rules, so they are often used in applications where the performance of query answering is critical.</p>
<p>Existing materialisation and incremental reasoning techniques enumerate all possible ways to apply rules to the data in order to derive all relevant consequences. This, however, can be inefficient because derivations of rules commonly used in practice are redundant; for example, rules axiomatising a binary predicate as symmetric and transitive can have a cubic number of applications, yet they can derive at most a quadratic number of facts. Such redundancy can be a significant source of overhead in practice and can prevent Datalog systems from successfully processing large datasets. To address this issue, in this paper we present a novel framework for modular materialisation and incremental reasoning. Our key idea is that, for certain combinations of rules commonly used in practice, all consequences can be derived using specialised procedures that do not necessarily enumerate all possible rule applications. Thus, our framework supports materialisation and incremental reasoning via a collection of modules. Each module is responsible for deriving consequences of a subset of the program, by using either standard rule application or proprietary algorithms. We prove that such an approach is complete as long as each module satisfies certain properties. Our formalisation of a module is very general, and in fact it allows modules to keep arbitrary auxiliary information.</p>
<p>We also show how to realise custom procedures for four types of modules: transitivity, symmetry–transitivity, chain rules, and sequencing elements of a total order. Finally, we demonstrate empirically that using our custom procedures can speed up materialisation and incremental reasoning by several orders of magnitude on several well-known benchmarks. Thus, our technique has the potential to significantly improve the scalability of Datalog reasoners.</p>
|