Revealing the Detailed Lineage of Script Outputs Using Hybrid Provenance

We illustrate how combining retrospective and prospectiveprovenance can yield scientifically meaningful hybrid provenancerepresentations of the computational histories of data produced during a script run. We use scripts from multiple disciplines (astrophysics, climate science, biodiversity data cur...

Full description

Bibliographic Details
Main Authors: Qian Zhang, Yang Cao, Qiwen Wang, Duc Vu, Priyaa Thavasimani, Timothy McPhillips, Paolo Missier, Peter Slaughter, Christopher Jones, Matthew B. Jones, Bertram Ludäscher
Format: Article
Language:English
Published: University of Edinburgh 2018-08-01
Series:International Journal of Digital Curation
Online Access:http://www.ijdc.net/article/view/585
_version_ 1818527780943429632
author Qian Zhang
Yang Cao
Qiwen Wang
Duc Vu
Priyaa Thavasimani
Timothy McPhillips
Paolo Missier
Peter Slaughter
Christopher Jones
Matthew B. Jones
Bertram Ludäscher
author_facet Qian Zhang
Yang Cao
Qiwen Wang
Duc Vu
Priyaa Thavasimani
Timothy McPhillips
Paolo Missier
Peter Slaughter
Christopher Jones
Matthew B. Jones
Bertram Ludäscher
author_sort Qian Zhang
collection DOAJ
description We illustrate how combining retrospective and prospectiveprovenance can yield scientifically meaningful hybrid provenancerepresentations of the computational histories of data produced during a script run. We use scripts from multiple disciplines (astrophysics, climate science, biodiversity data curation, and social network analysis), implemented in Python, R, and MATLAB, to highlight the usefulness of diverse forms of retrospectiveprovenance when coupled with prospectiveprovenance. Users provide prospective provenance, i.e., the conceptual workflows latent in scripts, via simple YesWorkflow annotations, embedded as script comments. Runtime observables can be linked to prospective provenance via relational views and queries. These observables could be found hidden in filenames or folder structures, be recorded in log files, or they can be automatically captured using tools such as noWorkflow or the DataONE RunManagers. The YesWorkflow toolkit, example scripts, and demonstration code are available via an open source repository.
first_indexed 2024-12-11T06:40:49Z
format Article
id doaj.art-89261fc52aa747f3a1ac87c9eeeab21f
institution Directory Open Access Journal
issn 1746-8256
language English
last_indexed 2024-12-11T06:40:49Z
publishDate 2018-08-01
publisher University of Edinburgh
record_format Article
series International Journal of Digital Curation
spelling doaj.art-89261fc52aa747f3a1ac87c9eeeab21f2022-12-22T01:17:15ZengUniversity of EdinburghInternational Journal of Digital Curation1746-82562018-08-0112210.2218/ijdc.v12i2.585Revealing the Detailed Lineage of Script Outputs Using Hybrid ProvenanceQian Zhang0Yang Cao1Qiwen Wang2Duc Vu3Priyaa Thavasimani4Timothy McPhillips5Paolo Missier6Peter Slaughter7Christopher Jones8Matthew B. Jones9Bertram Ludäscher10University of Illinois at Urbana-ChampaignUniversity of Illinois at Urbana-ChampaignUniversity of Illinois at Urbana-ChampaignUniversity of Illinois at ChicagoNewcastle UniversityUniversity of Illinois at Urbana-ChampaignNewcastle UniversityUniversity of California, Santa BarbaraUniversity of California, Santa BarbaraUniversity of California, Santa BarbaraUniversity of Illinois at Urbana-ChampaignWe illustrate how combining retrospective and prospectiveprovenance can yield scientifically meaningful hybrid provenancerepresentations of the computational histories of data produced during a script run. We use scripts from multiple disciplines (astrophysics, climate science, biodiversity data curation, and social network analysis), implemented in Python, R, and MATLAB, to highlight the usefulness of diverse forms of retrospectiveprovenance when coupled with prospectiveprovenance. Users provide prospective provenance, i.e., the conceptual workflows latent in scripts, via simple YesWorkflow annotations, embedded as script comments. Runtime observables can be linked to prospective provenance via relational views and queries. These observables could be found hidden in filenames or folder structures, be recorded in log files, or they can be automatically captured using tools such as noWorkflow or the DataONE RunManagers. The YesWorkflow toolkit, example scripts, and demonstration code are available via an open source repository.http://www.ijdc.net/article/view/585
spellingShingle Qian Zhang
Yang Cao
Qiwen Wang
Duc Vu
Priyaa Thavasimani
Timothy McPhillips
Paolo Missier
Peter Slaughter
Christopher Jones
Matthew B. Jones
Bertram Ludäscher
Revealing the Detailed Lineage of Script Outputs Using Hybrid Provenance
International Journal of Digital Curation
title Revealing the Detailed Lineage of Script Outputs Using Hybrid Provenance
title_full Revealing the Detailed Lineage of Script Outputs Using Hybrid Provenance
title_fullStr Revealing the Detailed Lineage of Script Outputs Using Hybrid Provenance
title_full_unstemmed Revealing the Detailed Lineage of Script Outputs Using Hybrid Provenance
title_short Revealing the Detailed Lineage of Script Outputs Using Hybrid Provenance
title_sort revealing the detailed lineage of script outputs using hybrid provenance
url http://www.ijdc.net/article/view/585
work_keys_str_mv AT qianzhang revealingthedetailedlineageofscriptoutputsusinghybridprovenance
AT yangcao revealingthedetailedlineageofscriptoutputsusinghybridprovenance
AT qiwenwang revealingthedetailedlineageofscriptoutputsusinghybridprovenance
AT ducvu revealingthedetailedlineageofscriptoutputsusinghybridprovenance
AT priyaathavasimani revealingthedetailedlineageofscriptoutputsusinghybridprovenance
AT timothymcphillips revealingthedetailedlineageofscriptoutputsusinghybridprovenance
AT paolomissier revealingthedetailedlineageofscriptoutputsusinghybridprovenance
AT peterslaughter revealingthedetailedlineageofscriptoutputsusinghybridprovenance
AT christopherjones revealingthedetailedlineageofscriptoutputsusinghybridprovenance
AT matthewbjones revealingthedetailedlineageofscriptoutputsusinghybridprovenance
AT bertramludascher revealingthedetailedlineageofscriptoutputsusinghybridprovenance