Golden Trail: Retrieving the Data History that Matters from a Comprehensive Provenance Repository
Experimental science can be thought of as the exploration of a large research space, in search of a few valuable results. While it is this “Golden Data†that gets published, the history of the exploration is often as valuable to the scientists as some of its outcomes. We envision an e-research in...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
University of Edinburgh
2012-03-01
|
Series: | International Journal of Digital Curation |
Online Access: | http://129.215.67.233:80/ijdc/article/view/221 |
_version_ | 1797402008538316800 |
---|---|
author | Paolo Missier Bertram Ludäscher Saumen Dey Michael Wang Tim McPhillips Shawn Bowers Michael Agun Ilkay Altintas |
author_facet | Paolo Missier Bertram Ludäscher Saumen Dey Michael Wang Tim McPhillips Shawn Bowers Michael Agun Ilkay Altintas |
author_sort | Paolo Missier |
collection | DOAJ |
description | Experimental science can be thought of as the exploration of a large research space, in search of a few valuable results. While it is this “Golden Data†that gets published, the history of the exploration is often as valuable to the scientists as some of its outcomes. We envision an e-research infrastructure that is capable of systematically and automatically recording such history – an assumption that holds today for a number of workflow management systems routinely used in e-science. In keeping with our gold rush metaphor, the provenance of a valuable result is a “Golden Trailâ€. Logically, this represents a detailed account of how the Golden Data was arrived at, and technically it is a sub-graph in the much larger graph of provenance traces that collectively tell the story of the entire research (or of some of it).
In this paper we describe a model and architecture for a repository dedicated to storing provenance traces and selectively retrieving Golden Trails from it. As traces from multiple experiments over long periods of time are accommodated, the trails may be sub-graphs of one trace, or they may be the logical representation of a virtual experiment obtained by joining together traces that share common data.
The project has been carried out within the Provenance Working Group of the Data Observation Network for Earth (DataONE) NSF project. Ultimately, our longer-term plan is to integrate the provenance repository into the data preservation architecture currently being developed by DataONE. |
first_indexed | 2024-03-09T02:19:08Z |
format | Article |
id | doaj.art-a9129103aa934ef186a181ed59624109 |
institution | Directory Open Access Journal |
issn | 1746-8256 |
language | English |
last_indexed | 2024-03-09T02:19:08Z |
publishDate | 2012-03-01 |
publisher | University of Edinburgh |
record_format | Article |
series | International Journal of Digital Curation |
spelling | doaj.art-a9129103aa934ef186a181ed596241092023-12-06T20:02:46ZengUniversity of EdinburghInternational Journal of Digital Curation1746-82562012-03-0171Golden Trail: Retrieving the Data History that Matters from a Comprehensive Provenance RepositoryPaolo MissierBertram LudäscherSaumen DeyMichael WangTim McPhillipsShawn BowersMichael AgunIlkay AltintasExperimental science can be thought of as the exploration of a large research space, in search of a few valuable results. While it is this “Golden Data†that gets published, the history of the exploration is often as valuable to the scientists as some of its outcomes. We envision an e-research infrastructure that is capable of systematically and automatically recording such history – an assumption that holds today for a number of workflow management systems routinely used in e-science. In keeping with our gold rush metaphor, the provenance of a valuable result is a “Golden Trailâ€. Logically, this represents a detailed account of how the Golden Data was arrived at, and technically it is a sub-graph in the much larger graph of provenance traces that collectively tell the story of the entire research (or of some of it). In this paper we describe a model and architecture for a repository dedicated to storing provenance traces and selectively retrieving Golden Trails from it. As traces from multiple experiments over long periods of time are accommodated, the trails may be sub-graphs of one trace, or they may be the logical representation of a virtual experiment obtained by joining together traces that share common data. The project has been carried out within the Provenance Working Group of the Data Observation Network for Earth (DataONE) NSF project. Ultimately, our longer-term plan is to integrate the provenance repository into the data preservation architecture currently being developed by DataONE.http://129.215.67.233:80/ijdc/article/view/221 |
spellingShingle | Paolo Missier Bertram Ludäscher Saumen Dey Michael Wang Tim McPhillips Shawn Bowers Michael Agun Ilkay Altintas Golden Trail: Retrieving the Data History that Matters from a Comprehensive Provenance Repository International Journal of Digital Curation |
title | Golden Trail: Retrieving the Data History that Matters from a Comprehensive Provenance Repository |
title_full | Golden Trail: Retrieving the Data History that Matters from a Comprehensive Provenance Repository |
title_fullStr | Golden Trail: Retrieving the Data History that Matters from a Comprehensive Provenance Repository |
title_full_unstemmed | Golden Trail: Retrieving the Data History that Matters from a Comprehensive Provenance Repository |
title_short | Golden Trail: Retrieving the Data History that Matters from a Comprehensive Provenance Repository |
title_sort | golden trail retrieving the data history that matters from a comprehensive provenance repository |
url | http://129.215.67.233:80/ijdc/article/view/221 |
work_keys_str_mv | AT paolomissier goldentrailretrievingthedatahistorythatmattersfromacomprehensiveprovenancerepository AT bertramludascher goldentrailretrievingthedatahistorythatmattersfromacomprehensiveprovenancerepository AT saumendey goldentrailretrievingthedatahistorythatmattersfromacomprehensiveprovenancerepository AT michaelwang goldentrailretrievingthedatahistorythatmattersfromacomprehensiveprovenancerepository AT timmcphillips goldentrailretrievingthedatahistorythatmattersfromacomprehensiveprovenancerepository AT shawnbowers goldentrailretrievingthedatahistorythatmattersfromacomprehensiveprovenancerepository AT michaelagun goldentrailretrievingthedatahistorythatmattersfromacomprehensiveprovenancerepository AT ilkayaltintas goldentrailretrievingthedatahistorythatmattersfromacomprehensiveprovenancerepository |