Frequent Itemset Mining and Multi-Layer Network-Based Analysis of RDF Databases

Triplestores or resource description framework (RDF) stores are purpose-built databases used to organise, store and share data with context. Knowledge extraction from a large amount of interconnected data requires effective tools and methods to address the complexity and the underlying structure of...

Full description

Bibliographic Details
Main Authors: Gergely Honti, János Abonyi
Format: Article
Language:English
Published: MDPI AG 2021-02-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/9/4/450
_version_ 1797395516774940672
author Gergely Honti
János Abonyi
author_facet Gergely Honti
János Abonyi
author_sort Gergely Honti
collection DOAJ
description Triplestores or resource description framework (RDF) stores are purpose-built databases used to organise, store and share data with context. Knowledge extraction from a large amount of interconnected data requires effective tools and methods to address the complexity and the underlying structure of semantic information. We propose a method that generates an interpretable multilayered network from an RDF database. The method utilises frequent itemset mining (FIM) of the subjects, predicates and the objects of the RDF data, and automatically extracts informative subsets of the database for the analysis. The results are used to form layers in an analysable multidimensional network. The methodology enables a consistent, transparent, multi-aspect-oriented knowledge extraction from the linked dataset. To demonstrate the usability and effectiveness of the methodology, we analyse how the science of sustainability and climate change are structured using the Microsoft Academic Knowledge Graph. In the case study, the FIM forms networks of disciplines to reveal the significant interdisciplinary science communities in sustainability and climate change. The constructed multilayer network then enables an analysis of the significant disciplines and interdisciplinary scientific areas. To demonstrate the proposed knowledge extraction process, we search for interdisciplinary science communities and then measure and rank their multidisciplinary effects. The analysis identifies discipline similarities, pinpointing the similarity between atmospheric science and meteorology as well as between geomorphology and oceanography. The results confirm that frequent itemset mining provides an informative sampled subsets of RDF databases which can be simultaneously analysed as layers of a multilayer network.
first_indexed 2024-03-09T00:35:39Z
format Article
id doaj.art-98c8f045302346e88d654ddc11df3e80
institution Directory Open Access Journal
issn 2227-7390
language English
last_indexed 2024-03-09T00:35:39Z
publishDate 2021-02-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj.art-98c8f045302346e88d654ddc11df3e802023-12-11T18:10:43ZengMDPI AGMathematics2227-73902021-02-019445010.3390/math9040450Frequent Itemset Mining and Multi-Layer Network-Based Analysis of RDF DatabasesGergely Honti0János Abonyi1MTA-PE Complex Systems Monitoring Research Group, University of Pannonia, 8200 Veszprem, HungaryMTA-PE Complex Systems Monitoring Research Group, University of Pannonia, 8200 Veszprem, HungaryTriplestores or resource description framework (RDF) stores are purpose-built databases used to organise, store and share data with context. Knowledge extraction from a large amount of interconnected data requires effective tools and methods to address the complexity and the underlying structure of semantic information. We propose a method that generates an interpretable multilayered network from an RDF database. The method utilises frequent itemset mining (FIM) of the subjects, predicates and the objects of the RDF data, and automatically extracts informative subsets of the database for the analysis. The results are used to form layers in an analysable multidimensional network. The methodology enables a consistent, transparent, multi-aspect-oriented knowledge extraction from the linked dataset. To demonstrate the usability and effectiveness of the methodology, we analyse how the science of sustainability and climate change are structured using the Microsoft Academic Knowledge Graph. In the case study, the FIM forms networks of disciplines to reveal the significant interdisciplinary science communities in sustainability and climate change. The constructed multilayer network then enables an analysis of the significant disciplines and interdisciplinary scientific areas. To demonstrate the proposed knowledge extraction process, we search for interdisciplinary science communities and then measure and rank their multidisciplinary effects. The analysis identifies discipline similarities, pinpointing the similarity between atmospheric science and meteorology as well as between geomorphology and oceanography. The results confirm that frequent itemset mining provides an informative sampled subsets of RDF databases which can be simultaneously analysed as layers of a multilayer network.https://www.mdpi.com/2227-7390/9/4/450multi-layer networkRDF storeDataToKnowledgeToNetworklinked data
spellingShingle Gergely Honti
János Abonyi
Frequent Itemset Mining and Multi-Layer Network-Based Analysis of RDF Databases
Mathematics
multi-layer network
RDF store
DataToKnowledgeToNetwork
linked data
title Frequent Itemset Mining and Multi-Layer Network-Based Analysis of RDF Databases
title_full Frequent Itemset Mining and Multi-Layer Network-Based Analysis of RDF Databases
title_fullStr Frequent Itemset Mining and Multi-Layer Network-Based Analysis of RDF Databases
title_full_unstemmed Frequent Itemset Mining and Multi-Layer Network-Based Analysis of RDF Databases
title_short Frequent Itemset Mining and Multi-Layer Network-Based Analysis of RDF Databases
title_sort frequent itemset mining and multi layer network based analysis of rdf databases
topic multi-layer network
RDF store
DataToKnowledgeToNetwork
linked data
url https://www.mdpi.com/2227-7390/9/4/450
work_keys_str_mv AT gergelyhonti frequentitemsetminingandmultilayernetworkbasedanalysisofrdfdatabases
AT janosabonyi frequentitemsetminingandmultilayernetworkbasedanalysisofrdfdatabases