Creating lightweight FAIR Digital Objects with RO-Crate

RO-Crate (Soiland-Reyes et al. 2022) is a lightweight method to package research outputs along with their metadata, based on Linked Data principles (Bizer et al. 2009) and W3C standards. RO-Crate provides a flexible mechanism for researchers archiving and publishing rich data packages (or any other...

Full description

Bibliographic Details
Main Authors: Stian Soiland-Reyes, Peter Sefton, Leyla Jael Castro, Frederik Coppens, Daniel Garijo, Simone Leo, Marc Portier, Paul Groth
Format: Article
Language:English
Published: Pensoft Publishers 2022-10-01
Series:Research Ideas and Outcomes
Subjects:
Online Access:https://riojournal.com/article/93937/download/pdf/
_version_ 1811243765659074560
author Stian Soiland-Reyes
Peter Sefton
Leyla Jael Castro
Frederik Coppens
Daniel Garijo
Simone Leo
Marc Portier
Paul Groth
author_facet Stian Soiland-Reyes
Peter Sefton
Leyla Jael Castro
Frederik Coppens
Daniel Garijo
Simone Leo
Marc Portier
Paul Groth
author_sort Stian Soiland-Reyes
collection DOAJ
description RO-Crate (Soiland-Reyes et al. 2022) is a lightweight method to package research outputs along with their metadata, based on Linked Data principles (Bizer et al. 2009) and W3C standards. RO-Crate provides a flexible mechanism for researchers archiving and publishing rich data packages (or any other research outcome) by capturing their dependencies and context. However, additional measures should be taken to ensure that a crate is also following the FAIR principles (Wilkinson 2016), including consistent use of persistent identifiers, provenance, community standards, clear machine/human-readable licensing for metadata and data, and Web publication of RO-Crates.The FAIR Digital Object (FDO) approach (De Smedt et al. 2020) gives a set of recommendations that aims to improve findability, accessibility, interoperability and reproducibility for any digital object, allowing implementation through different protocols or standards.Here we present how we have followed the FDO recommendations and turned research outcomes into FDOs by publishing RO-Crates on the Web using HTTP, following best practices for Linked Data. We highlight challenges and advantages of the FDO approach, and reflect on what is required for an FDO profile to achieve FAIR RO-Crates.The implementation allows for a broad range of use cases, across scientific domains. A minimal RO-Crate may be represented as a persistent URI resolving to a summary website describing the outputs in a scientific investigation (e.g. https://w3id.org/dgarijo/ro/sepln2022 with links to the used datasets along with software). One of the advantages of RO-Crates is flexibility, particularly regarding the metadata accompanying the actual research outcome. RO-Crate extends schema.org, a popular vocabulary for describing resources on the Web (Guha et al. 2016). A generic RO-Crate is not required to be typed beyond Dataset*1. In practice, RO-Crates declare conformance to particular profiles, allowing processing based on the specific needs and assumptions of a community or usage scenario. This, effectively, makes RO-Crates typed and thus machine-actionable. RO-Crate profiles serve as metadata templates, making it easier for communities to agree and build upon their own metadata needs.RO-Crates have been combined with machine-actionable Data Management Plans (maDMPs) to automate and facilitate management of research data (Miksa et al. 2020). This mapping allows RO-Crates to be generated out of maDMPs and vice versa. The ELIXIR Software Management Plans (Alves et al. 2021) is planning to move their questionnaire to a machine-actionable format with RO-Crate. ELIXIR Biohackathon 2022 will explore integration of RO-Crate and the Data Stewardship Wizard (Pergl et al. 2019) with Galaxy, which can automate FDO creation that also follows data management plans.A tailored RO-Crate profile has been defined to represent Electronic Lab Notebooks (ELN) protocols bundled together with metadata and related datasets. Schröder et al. (2022) uses RO-Crates to encode provenance information at different levels, including researchers, manufacturers, biological and chemical resources, activities, measurements, and resulting research data. The use of RO-Crates makes it easier to programmatically question-answer information related to the protocols, for instance activities, resources and equipment used to create data. Another example is WorkflowHub (Goble et al. 2021) which defines the Workflow RO-Crate profile (Bacall et al. 2022), imposing additional constraints such as the presence of a main workflow and a license. It also specifies which entity types and properties must be used to provide such information, implicitly defining a set of operations (e.g., get the main workflow and its language) that are valid on all complying crates. The workflow system Galaxy (The Galaxy Community 2022) retrieves such Workflow Crates using GA4GH TRS API.The workflow profile has been further extended (with OOP-like inheritance) in Workflow Testing RO-Crate, adding formal workflow testing components: this adds operations such as getting remote test instances and test definitions, used by the LifeMonitor service to keep track of the health status of multiple published workflows. While RO-Crates use Web technologies, they are also self-contained, moving data along with their metadata. This is a powerful construct for interoperability across FAIR repositories, but this raises some challenges with regards to mutability and persistence of crates.To illustrate how such challenges can be handled, we detail how the WorkflowHub repository follows several FDO principles:Workflow entries must be frozen for editing and have complete kernel metadata (title, authors, license, description) [FDOF4] before they can be assigned a persistent identifier, e.g. https://doi.org/10.48546/workflowhub.workflow.255.1 [FDOF1]Computational workflows can be composed of multiple files used as a whole, e.g. CWL files in a GitHub repository. These are snapshotted as a single RO-Crate ZIP, indicating the main workflow. [FDOF11]PID resolution can content-negotiate to Datacite’s PID metadata [FDOF2] or use FAIR Signposting to find an RO-Crate containing the workflow [FDOF3] and richer JSON-LD metadata resources [FDOF5,FDOF8], see Fig. 1Metadata uses schema.org [FDOF7] following the community-developed Bioschemas ComputationalWorkflow profile [FDOF10].Workflows are discovered using the GA4GH TRS API [FDOF5,FDOF6,FDOF11] and created/modified using CRUD operations [FDOF6]The RO-Crate profile, effectively the FDO Type [FDOF7], is declared as https://w3id.org/workflowhub/workflow-ro-crate/1.0; the workflow language (e.g. https://w3id.org/workflowhub/workflow-ro-crate#galaxy) is defined in metadata of the main workflow. Further work on RO-Crate profiles include to formalise links to the API operations and repositories [FDOF5,FDOF7], to include PIDs of profiles and types in the FAIR Signposting, and HTTP navigation to individual resources within the RO-Crate.RO-Crate has shown a broad adoption by communities across many scientific disciplines, providing a lightweight, and therefore easy to adopt, approach to generating FAIR Digital Objects. It is rapidly becoming an integral part of the interoperability fabric between the different components as demonstrated here for WorkflowHub, contributing to building the European Open Science Cloud.
first_indexed 2024-04-12T14:14:16Z
format Article
id doaj.art-6d755bbf122545ec8c819a6f32f059b7
institution Directory Open Access Journal
issn 2367-7163
language English
last_indexed 2024-04-12T14:14:16Z
publishDate 2022-10-01
publisher Pensoft Publishers
record_format Article
series Research Ideas and Outcomes
spelling doaj.art-6d755bbf122545ec8c819a6f32f059b72022-12-22T03:29:47ZengPensoft PublishersResearch Ideas and Outcomes2367-71632022-10-0181610.3897/rio.8.e9393793937Creating lightweight FAIR Digital Objects with RO-CrateStian Soiland-Reyes0Peter Sefton1Leyla Jael Castro2Frederik Coppens3Daniel Garijo4Simone Leo5Marc Portier6Paul Groth7Informatics Institute, Faculty of Science, University of AmsterdamThe University of Queensland School of Languages and Cultures, The University of QueenslandInformationszentrum Lebenswissenschaften (ZB Med)Vlaams Instituut voor Biotechnologie & Universiteit Ghent (VIB-UGent) Center for Plant Systems BiologyOntology Engineering Group, Universidad Politécnica de MadridCenter for Advanced Studies, Research, and Development in Sardinia (CRS4)Vlaams Instituut voor de Zee (VLIZ)Informatics Institute, Faculty of Science, University of AmsterdamRO-Crate (Soiland-Reyes et al. 2022) is a lightweight method to package research outputs along with their metadata, based on Linked Data principles (Bizer et al. 2009) and W3C standards. RO-Crate provides a flexible mechanism for researchers archiving and publishing rich data packages (or any other research outcome) by capturing their dependencies and context. However, additional measures should be taken to ensure that a crate is also following the FAIR principles (Wilkinson 2016), including consistent use of persistent identifiers, provenance, community standards, clear machine/human-readable licensing for metadata and data, and Web publication of RO-Crates.The FAIR Digital Object (FDO) approach (De Smedt et al. 2020) gives a set of recommendations that aims to improve findability, accessibility, interoperability and reproducibility for any digital object, allowing implementation through different protocols or standards.Here we present how we have followed the FDO recommendations and turned research outcomes into FDOs by publishing RO-Crates on the Web using HTTP, following best practices for Linked Data. We highlight challenges and advantages of the FDO approach, and reflect on what is required for an FDO profile to achieve FAIR RO-Crates.The implementation allows for a broad range of use cases, across scientific domains. A minimal RO-Crate may be represented as a persistent URI resolving to a summary website describing the outputs in a scientific investigation (e.g. https://w3id.org/dgarijo/ro/sepln2022 with links to the used datasets along with software). One of the advantages of RO-Crates is flexibility, particularly regarding the metadata accompanying the actual research outcome. RO-Crate extends schema.org, a popular vocabulary for describing resources on the Web (Guha et al. 2016). A generic RO-Crate is not required to be typed beyond Dataset*1. In practice, RO-Crates declare conformance to particular profiles, allowing processing based on the specific needs and assumptions of a community or usage scenario. This, effectively, makes RO-Crates typed and thus machine-actionable. RO-Crate profiles serve as metadata templates, making it easier for communities to agree and build upon their own metadata needs.RO-Crates have been combined with machine-actionable Data Management Plans (maDMPs) to automate and facilitate management of research data (Miksa et al. 2020). This mapping allows RO-Crates to be generated out of maDMPs and vice versa. The ELIXIR Software Management Plans (Alves et al. 2021) is planning to move their questionnaire to a machine-actionable format with RO-Crate. ELIXIR Biohackathon 2022 will explore integration of RO-Crate and the Data Stewardship Wizard (Pergl et al. 2019) with Galaxy, which can automate FDO creation that also follows data management plans.A tailored RO-Crate profile has been defined to represent Electronic Lab Notebooks (ELN) protocols bundled together with metadata and related datasets. Schröder et al. (2022) uses RO-Crates to encode provenance information at different levels, including researchers, manufacturers, biological and chemical resources, activities, measurements, and resulting research data. The use of RO-Crates makes it easier to programmatically question-answer information related to the protocols, for instance activities, resources and equipment used to create data. Another example is WorkflowHub (Goble et al. 2021) which defines the Workflow RO-Crate profile (Bacall et al. 2022), imposing additional constraints such as the presence of a main workflow and a license. It also specifies which entity types and properties must be used to provide such information, implicitly defining a set of operations (e.g., get the main workflow and its language) that are valid on all complying crates. The workflow system Galaxy (The Galaxy Community 2022) retrieves such Workflow Crates using GA4GH TRS API.The workflow profile has been further extended (with OOP-like inheritance) in Workflow Testing RO-Crate, adding formal workflow testing components: this adds operations such as getting remote test instances and test definitions, used by the LifeMonitor service to keep track of the health status of multiple published workflows. While RO-Crates use Web technologies, they are also self-contained, moving data along with their metadata. This is a powerful construct for interoperability across FAIR repositories, but this raises some challenges with regards to mutability and persistence of crates.To illustrate how such challenges can be handled, we detail how the WorkflowHub repository follows several FDO principles:Workflow entries must be frozen for editing and have complete kernel metadata (title, authors, license, description) [FDOF4] before they can be assigned a persistent identifier, e.g. https://doi.org/10.48546/workflowhub.workflow.255.1 [FDOF1]Computational workflows can be composed of multiple files used as a whole, e.g. CWL files in a GitHub repository. These are snapshotted as a single RO-Crate ZIP, indicating the main workflow. [FDOF11]PID resolution can content-negotiate to Datacite’s PID metadata [FDOF2] or use FAIR Signposting to find an RO-Crate containing the workflow [FDOF3] and richer JSON-LD metadata resources [FDOF5,FDOF8], see Fig. 1Metadata uses schema.org [FDOF7] following the community-developed Bioschemas ComputationalWorkflow profile [FDOF10].Workflows are discovered using the GA4GH TRS API [FDOF5,FDOF6,FDOF11] and created/modified using CRUD operations [FDOF6]The RO-Crate profile, effectively the FDO Type [FDOF7], is declared as https://w3id.org/workflowhub/workflow-ro-crate/1.0; the workflow language (e.g. https://w3id.org/workflowhub/workflow-ro-crate#galaxy) is defined in metadata of the main workflow. Further work on RO-Crate profiles include to formalise links to the API operations and repositories [FDOF5,FDOF7], to include PIDs of profiles and types in the FAIR Signposting, and HTTP navigation to individual resources within the RO-Crate.RO-Crate has shown a broad adoption by communities across many scientific disciplines, providing a lightweight, and therefore easy to adopt, approach to generating FAIR Digital Objects. It is rapidly becoming an integral part of the interoperability fabric between the different components as demonstrated here for WorkflowHub, contributing to building the European Open Science Cloud.https://riojournal.com/article/93937/download/pdf/FAIRresearch objectlinked dataRO-CrateJSON
spellingShingle Stian Soiland-Reyes
Peter Sefton
Leyla Jael Castro
Frederik Coppens
Daniel Garijo
Simone Leo
Marc Portier
Paul Groth
Creating lightweight FAIR Digital Objects with RO-Crate
Research Ideas and Outcomes
FAIR
research object
linked data
RO-Crate
JSON
title Creating lightweight FAIR Digital Objects with RO-Crate
title_full Creating lightweight FAIR Digital Objects with RO-Crate
title_fullStr Creating lightweight FAIR Digital Objects with RO-Crate
title_full_unstemmed Creating lightweight FAIR Digital Objects with RO-Crate
title_short Creating lightweight FAIR Digital Objects with RO-Crate
title_sort creating lightweight fair digital objects with ro crate
topic FAIR
research object
linked data
RO-Crate
JSON
url https://riojournal.com/article/93937/download/pdf/
work_keys_str_mv AT stiansoilandreyes creatinglightweightfairdigitalobjectswithrocrate
AT petersefton creatinglightweightfairdigitalobjectswithrocrate
AT leylajaelcastro creatinglightweightfairdigitalobjectswithrocrate
AT frederikcoppens creatinglightweightfairdigitalobjectswithrocrate
AT danielgarijo creatinglightweightfairdigitalobjectswithrocrate
AT simoneleo creatinglightweightfairdigitalobjectswithrocrate
AT marcportier creatinglightweightfairdigitalobjectswithrocrate
AT paulgroth creatinglightweightfairdigitalobjectswithrocrate