Provenance network analytics

Provenance network analytics is a novel data analytics approach that helps infer properties of data, such as quality or importance, from their provenance. Instead of analysing application data, which are typically domain-dependent, it analyses the data’s provenance as represented using the World Wid...

Full description

Bibliographic Details
Main Authors: Huynh, T, Ebden, M, Fischer, J, Roberts, S, Moreau, L
Format: Journal article
Language:English
Published: Springer 2018
_version_ 1797071003705147392
author Huynh, T
Ebden, M
Fischer, J
Roberts, S
Moreau, L
author_facet Huynh, T
Ebden, M
Fischer, J
Roberts, S
Moreau, L
author_sort Huynh, T
collection OXFORD
description Provenance network analytics is a novel data analytics approach that helps infer properties of data, such as quality or importance, from their provenance. Instead of analysing application data, which are typically domain-dependent, it analyses the data’s provenance as represented using the World Wide Web Consortium’s domain-agnostic PROV data model. Specifically, the approach proposes a number of network metrics for provenance data and applies established machine learning techniques over such metrics to build predictive models for some key properties of data. Applying this method to the provenance of real-world data from three different applications, we show that it can successfully identify the owners of provenance documents, assess the quality of crowdsourced data, and identify instructions from chat messages in an alternate-reality game with high levels of accuracy. By so doing, we demonstrate the different ways the proposed provenance network metrics can be used in analysing data, providing the foundation for provenance-based data analytics.
first_indexed 2024-03-06T22:47:00Z
format Journal article
id oxford-uuid:5d8608c8-2278-43b6-aa22-715c6fa381c4
institution University of Oxford
language English
last_indexed 2024-03-06T22:47:00Z
publishDate 2018
publisher Springer
record_format dspace
spelling oxford-uuid:5d8608c8-2278-43b6-aa22-715c6fa381c42022-03-26T17:34:57ZProvenance network analyticsJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:5d8608c8-2278-43b6-aa22-715c6fa381c4EnglishSymplectic Elements at OxfordSpringer2018Huynh, TEbden, MFischer, JRoberts, SMoreau, LProvenance network analytics is a novel data analytics approach that helps infer properties of data, such as quality or importance, from their provenance. Instead of analysing application data, which are typically domain-dependent, it analyses the data’s provenance as represented using the World Wide Web Consortium’s domain-agnostic PROV data model. Specifically, the approach proposes a number of network metrics for provenance data and applies established machine learning techniques over such metrics to build predictive models for some key properties of data. Applying this method to the provenance of real-world data from three different applications, we show that it can successfully identify the owners of provenance documents, assess the quality of crowdsourced data, and identify instructions from chat messages in an alternate-reality game with high levels of accuracy. By so doing, we demonstrate the different ways the proposed provenance network metrics can be used in analysing data, providing the foundation for provenance-based data analytics.
spellingShingle Huynh, T
Ebden, M
Fischer, J
Roberts, S
Moreau, L
Provenance network analytics
title Provenance network analytics
title_full Provenance network analytics
title_fullStr Provenance network analytics
title_full_unstemmed Provenance network analytics
title_short Provenance network analytics
title_sort provenance network analytics
work_keys_str_mv AT huynht provenancenetworkanalytics
AT ebdenm provenancenetworkanalytics
AT fischerj provenancenetworkanalytics
AT robertss provenancenetworkanalytics
AT moreaul provenancenetworkanalytics