Summary: | Motivated by applications in declarative data analysis, in this article, we study DatalogZ—an extension of Datalog with stratified negation and arithmetic functions over integers. This language is known to be undecidable, so we present the fragment of limit DatalogZ programs, which is powerful enough to naturally capture many important data analysis tasks. In limit DatalogZ, all intensional predicates with a numeric argument are limit predicates that keep maximal or minimal bounds on numeric values. We show that reasoning in limit DatalogZ is decidable if a linearity condition restricting the use of multiplication is satisfied. In particular, limit-linear DatalogZ is complete for Δ2EXP and captures Δ2P over ordered datasets in the sense of descriptive complexity. We also provide a comprehensive study of several fragments of limit-linear DatalogZ. We show that semi-positive limit-linear programs (i.e., programs where negation is allowed only in front of extensional atoms) capture coNP over ordered datasets; furthermore, reasoning becomes coNEXP-complete in combined and coNP-complete in data complexity, where the lower bounds hold already for negation-free programs. In order to satisfy the requirements of data-intensive applications, we also propose an additional stability requirement, which causes the complexity of reasoning to drop to EXP in combined and to P in data complexity, thus obtaining the same bounds as for usual Datalog. Finally, we compare our formalisms with the languages underpinning existing Datalog-based approaches for data analysis and show that core fragments of these languages can be encoded as limit programs; this allows us to transfer decidability and complexity upper bounds from limit programs to other formalisms. Therefore, our article provides a unified logical framework for declarative data analysis which can be used as a basis for understanding the impact on expressive power and computational complexity of the key constructs available in existing languages.
|