Evaluating and Aggregating Data Believability across Quality Sub-Dimensions and Data Lineage

Data quality is crucial for operational efficiency and sound decision making. This paper focuses on believability, a major aspect of data quality. The issue of believability is particularly relevant in the context of Web 2.0, where mashups facilitate the combination of data from different sources....

Full description

Bibliographic Details
Main Authors: Prat, Nicolas, Madnick, Stuart E.
Format: Working Paper
Language:en_US
Published: 2008
Subjects:
Online Access:http://hdl.handle.net/1721.1/40085
_version_ 1826213753515933696
author Prat, Nicolas
Madnick, Stuart E.
author_facet Prat, Nicolas
Madnick, Stuart E.
author_sort Prat, Nicolas
collection MIT
description Data quality is crucial for operational efficiency and sound decision making. This paper focuses on believability, a major aspect of data quality. The issue of believability is particularly relevant in the context of Web 2.0, where mashups facilitate the combination of data from different sources. Our approach for assessing data believability is based on provenance and lineage, i.e. the origin and subsequent processing history of data. We present the main concepts of our model for representing and storing data provenance, and an ontology of the sub-dimensions of data believability. We then use aggregation operators to compute believability across the sub-dimensions of data believability and the provenance of data. We illustrate our approach with a scenario based on Internet data. Our contribution lies in three main design artifacts (1) the provenance model (2) the ontology of believability subdimensions and (3) the method for computing and aggregating data believability. To our knowledge, this is the first work to operationalize provenance-based assessment of data believability.
first_indexed 2024-09-23T15:54:16Z
format Working Paper
id mit-1721.1/40085
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T15:54:16Z
publishDate 2008
record_format dspace
spelling mit-1721.1/400852019-04-12T09:31:47Z Evaluating and Aggregating Data Believability across Quality Sub-Dimensions and Data Lineage Prat, Nicolas Madnick, Stuart E. Data Lineage Web 2.0 Data quality is crucial for operational efficiency and sound decision making. This paper focuses on believability, a major aspect of data quality. The issue of believability is particularly relevant in the context of Web 2.0, where mashups facilitate the combination of data from different sources. Our approach for assessing data believability is based on provenance and lineage, i.e. the origin and subsequent processing history of data. We present the main concepts of our model for representing and storing data provenance, and an ontology of the sub-dimensions of data believability. We then use aggregation operators to compute believability across the sub-dimensions of data believability and the provenance of data. We illustrate our approach with a scenario based on Internet data. Our contribution lies in three main design artifacts (1) the provenance model (2) the ontology of believability subdimensions and (3) the method for computing and aggregating data believability. To our knowledge, this is the first work to operationalize provenance-based assessment of data believability. 2008-01-11T18:15:00Z 2008-01-11T18:15:00Z 2008-01-11T18:15:00Z Working Paper http://hdl.handle.net/1721.1/40085 en_US MIT Sloan School of Management Working Paper 4670-07 application/pdf
spellingShingle Data Lineage
Web 2.0
Prat, Nicolas
Madnick, Stuart E.
Evaluating and Aggregating Data Believability across Quality Sub-Dimensions and Data Lineage
title Evaluating and Aggregating Data Believability across Quality Sub-Dimensions and Data Lineage
title_full Evaluating and Aggregating Data Believability across Quality Sub-Dimensions and Data Lineage
title_fullStr Evaluating and Aggregating Data Believability across Quality Sub-Dimensions and Data Lineage
title_full_unstemmed Evaluating and Aggregating Data Believability across Quality Sub-Dimensions and Data Lineage
title_short Evaluating and Aggregating Data Believability across Quality Sub-Dimensions and Data Lineage
title_sort evaluating and aggregating data believability across quality sub dimensions and data lineage
topic Data Lineage
Web 2.0
url http://hdl.handle.net/1721.1/40085
work_keys_str_mv AT pratnicolas evaluatingandaggregatingdatabelievabilityacrossqualitysubdimensionsanddatalineage
AT madnickstuarte evaluatingandaggregatingdatabelievabilityacrossqualitysubdimensionsanddatalineage