Summary: | Data quality is crucial for operational efficiency and sound decision making. This paper focuses on believability,
a major aspect of data quality. The issue of believability is particularly relevant in the context of Web 2.0, where
mashups facilitate the combination of data from different sources. Our approach for assessing data believability is
based on provenance and lineage, i.e. the origin and subsequent processing history of data. We present the main
concepts of our model for representing and storing data provenance, and an ontology of the sub-dimensions of data
believability. We then use aggregation operators to compute believability across the sub-dimensions of data
believability and the provenance of data. We illustrate our approach with a scenario based on Internet data. Our
contribution lies in three main design artifacts (1) the provenance model (2) the ontology of believability subdimensions
and (3) the method for computing and aggregating data believability. To our knowledge, this is the first
work to operationalize provenance-based assessment of data believability.
|