Golden Trail: Retrieving the Data History that Matters from a Comprehensive Provenance Repository

Experimental science can be thought of as the exploration of a large research space, in search of a few valuable results. While it is this “Golden Data†that gets published, the history of the exploration is often as valuable to the scientists as some of its outcomes. We envision an e-research in...

Full description

Bibliographic Details
Main Authors: Paolo Missier, Bertram Ludäscher, Saumen Dey, Michael Wang, Tim McPhillips, Shawn Bowers, Michael Agun, Ilkay Altintas
Format: Article
Language:English
Published: University of Edinburgh 2012-03-01
Series:International Journal of Digital Curation
Online Access:http://129.215.67.233:80/ijdc/article/view/221
_version_ 1797402008538316800
author Paolo Missier
Bertram Ludäscher
Saumen Dey
Michael Wang
Tim McPhillips
Shawn Bowers
Michael Agun
Ilkay Altintas
author_facet Paolo Missier
Bertram Ludäscher
Saumen Dey
Michael Wang
Tim McPhillips
Shawn Bowers
Michael Agun
Ilkay Altintas
author_sort Paolo Missier
collection DOAJ
description Experimental science can be thought of as the exploration of a large research space, in search of a few valuable results. While it is this “Golden Data†that gets published, the history of the exploration is often as valuable to the scientists as some of its outcomes. We envision an e-research infrastructure that is capable of systematically and automatically recording such history – an assumption that holds today for a number of workflow management systems routinely used in e-science. In keeping with our gold rush metaphor, the provenance of a valuable result is a “Golden Trailâ€. Logically, this represents a detailed account of how the Golden Data was arrived at, and technically it is a sub-graph in the much larger graph of provenance traces that collectively tell the story of the entire research (or of some of it). In this paper we describe a model and architecture for a repository dedicated to storing provenance traces and selectively retrieving Golden Trails from it. As traces from multiple experiments over long periods of time are accommodated, the trails may be sub-graphs of one trace, or they may be the logical representation of a virtual experiment obtained by joining together traces that share common data. The project has been carried out within the Provenance Working Group of the Data Observation Network for Earth (DataONE) NSF project. Ultimately, our longer-term plan is to integrate the provenance repository into the data preservation architecture currently being developed by DataONE.
first_indexed 2024-03-09T02:19:08Z
format Article
id doaj.art-a9129103aa934ef186a181ed59624109
institution Directory Open Access Journal
issn 1746-8256
language English
last_indexed 2024-03-09T02:19:08Z
publishDate 2012-03-01
publisher University of Edinburgh
record_format Article
series International Journal of Digital Curation
spelling doaj.art-a9129103aa934ef186a181ed596241092023-12-06T20:02:46ZengUniversity of EdinburghInternational Journal of Digital Curation1746-82562012-03-0171Golden Trail: Retrieving the Data History that Matters from a Comprehensive Provenance RepositoryPaolo MissierBertram LudäscherSaumen DeyMichael WangTim McPhillipsShawn BowersMichael AgunIlkay AltintasExperimental science can be thought of as the exploration of a large research space, in search of a few valuable results. While it is this “Golden Data†that gets published, the history of the exploration is often as valuable to the scientists as some of its outcomes. We envision an e-research infrastructure that is capable of systematically and automatically recording such history – an assumption that holds today for a number of workflow management systems routinely used in e-science. In keeping with our gold rush metaphor, the provenance of a valuable result is a “Golden Trailâ€. Logically, this represents a detailed account of how the Golden Data was arrived at, and technically it is a sub-graph in the much larger graph of provenance traces that collectively tell the story of the entire research (or of some of it). In this paper we describe a model and architecture for a repository dedicated to storing provenance traces and selectively retrieving Golden Trails from it. As traces from multiple experiments over long periods of time are accommodated, the trails may be sub-graphs of one trace, or they may be the logical representation of a virtual experiment obtained by joining together traces that share common data. The project has been carried out within the Provenance Working Group of the Data Observation Network for Earth (DataONE) NSF project. Ultimately, our longer-term plan is to integrate the provenance repository into the data preservation architecture currently being developed by DataONE.http://129.215.67.233:80/ijdc/article/view/221
spellingShingle Paolo Missier
Bertram Ludäscher
Saumen Dey
Michael Wang
Tim McPhillips
Shawn Bowers
Michael Agun
Ilkay Altintas
Golden Trail: Retrieving the Data History that Matters from a Comprehensive Provenance Repository
International Journal of Digital Curation
title Golden Trail: Retrieving the Data History that Matters from a Comprehensive Provenance Repository
title_full Golden Trail: Retrieving the Data History that Matters from a Comprehensive Provenance Repository
title_fullStr Golden Trail: Retrieving the Data History that Matters from a Comprehensive Provenance Repository
title_full_unstemmed Golden Trail: Retrieving the Data History that Matters from a Comprehensive Provenance Repository
title_short Golden Trail: Retrieving the Data History that Matters from a Comprehensive Provenance Repository
title_sort golden trail retrieving the data history that matters from a comprehensive provenance repository
url http://129.215.67.233:80/ijdc/article/view/221
work_keys_str_mv AT paolomissier goldentrailretrievingthedatahistorythatmattersfromacomprehensiveprovenancerepository
AT bertramludascher goldentrailretrievingthedatahistorythatmattersfromacomprehensiveprovenancerepository
AT saumendey goldentrailretrievingthedatahistorythatmattersfromacomprehensiveprovenancerepository
AT michaelwang goldentrailretrievingthedatahistorythatmattersfromacomprehensiveprovenancerepository
AT timmcphillips goldentrailretrievingthedatahistorythatmattersfromacomprehensiveprovenancerepository
AT shawnbowers goldentrailretrievingthedatahistorythatmattersfromacomprehensiveprovenancerepository
AT michaelagun goldentrailretrievingthedatahistorythatmattersfromacomprehensiveprovenancerepository
AT ilkayaltintas goldentrailretrievingthedatahistorythatmattersfromacomprehensiveprovenancerepository