Simplifying the development of portable, scalable, and reproducible workflows

Command-line software plays a critical role in biology research. However, processes for installing and executing software differ widely. The Common Workflow Language (CWL) is a community standard that addresses this problem. Using CWL, tool developers can formally describe a tool’s inputs, outputs,...

Full description

Bibliographic Details
Main Authors: Stephen R Piccolo, Zachary E Ence, Elizabeth C Anderson, Jeffrey T Chang, Andrea H Bild
Format: Article
Language:English
Published: eLife Sciences Publications Ltd 2021-10-01
Series:eLife
Subjects:
Online Access:https://elifesciences.org/articles/71069
_version_ 1828375566095482880
author Stephen R Piccolo
Zachary E Ence
Elizabeth C Anderson
Jeffrey T Chang
Andrea H Bild
author_facet Stephen R Piccolo
Zachary E Ence
Elizabeth C Anderson
Jeffrey T Chang
Andrea H Bild
author_sort Stephen R Piccolo
collection DOAJ
description Command-line software plays a critical role in biology research. However, processes for installing and executing software differ widely. The Common Workflow Language (CWL) is a community standard that addresses this problem. Using CWL, tool developers can formally describe a tool’s inputs, outputs, and other execution details. CWL documents can include instructions for executing tools inside software containers. Accordingly, CWL tools are portable—they can be executed on diverse computers—including personal workstations, high-performance clusters, or the cloud. CWL also supports workflows, which describe dependencies among tools and using outputs from one tool as inputs to others. To date, CWL has been used primarily for batch processing of large datasets, especially in genomics. But it can also be used for analytical steps of a study. This article explains key concepts about CWL and software containers and provides examples for using CWL in biology research. CWL documents are text-based, so they can be created manually, without computer programming. However, ensuring that these documents conform to the CWL specification may prevent some users from adopting it. To address this gap, we created ToolJig, a Web application that enables researchers to create CWL documents interactively. ToolJig validates information provided by the user to ensure it is complete and valid. After creating a CWL tool or workflow, the user can create ‘input-object’ files, which store values for a particular invocation of a tool or workflow. In addition, ToolJig provides examples of how to execute the tool or workflow via a workflow engine. ToolJig and our examples are available at https://github.com/srp33/ToolJig.
first_indexed 2024-04-14T07:48:17Z
format Article
id doaj.art-c9c3d5bd1f8c4b3c931c442ad9e468a1
institution Directory Open Access Journal
issn 2050-084X
language English
last_indexed 2024-04-14T07:48:17Z
publishDate 2021-10-01
publisher eLife Sciences Publications Ltd
record_format Article
series eLife
spelling doaj.art-c9c3d5bd1f8c4b3c931c442ad9e468a12022-12-22T02:05:16ZengeLife Sciences Publications LtdeLife2050-084X2021-10-011010.7554/eLife.71069Simplifying the development of portable, scalable, and reproducible workflowsStephen R Piccolo0https://orcid.org/0000-0003-2001-5640Zachary E Ence1Elizabeth C Anderson2Jeffrey T Chang3Andrea H Bild4Department of Biology, Brigham Young University, Provo, United StatesDepartment of Biology, Brigham Young University, Provo, United StatesDepartment of Biology, Brigham Young University, Provo, United StatesDepartment of Integrative Biology and Pharmacology, University of Texas Health Science Center at Houston, Houston, United StatesDepartment of Medical Oncology and Therapeutics, City of Hope Comprehensive Cancer Institute, Monrovia, United StatesCommand-line software plays a critical role in biology research. However, processes for installing and executing software differ widely. The Common Workflow Language (CWL) is a community standard that addresses this problem. Using CWL, tool developers can formally describe a tool’s inputs, outputs, and other execution details. CWL documents can include instructions for executing tools inside software containers. Accordingly, CWL tools are portable—they can be executed on diverse computers—including personal workstations, high-performance clusters, or the cloud. CWL also supports workflows, which describe dependencies among tools and using outputs from one tool as inputs to others. To date, CWL has been used primarily for batch processing of large datasets, especially in genomics. But it can also be used for analytical steps of a study. This article explains key concepts about CWL and software containers and provides examples for using CWL in biology research. CWL documents are text-based, so they can be created manually, without computer programming. However, ensuring that these documents conform to the CWL specification may prevent some users from adopting it. To address this gap, we created ToolJig, a Web application that enables researchers to create CWL documents interactively. ToolJig validates information provided by the user to ensure it is complete and valid. After creating a CWL tool or workflow, the user can create ‘input-object’ files, which store values for a particular invocation of a tool or workflow. In addition, ToolJig provides examples of how to execute the tool or workflow via a workflow engine. ToolJig and our examples are available at https://github.com/srp33/ToolJig.https://elifesciences.org/articles/71069computational workflowsresearch reproducibilitylearn by exampleWeb applicationCommon Workflow Languagecommand-line software
spellingShingle Stephen R Piccolo
Zachary E Ence
Elizabeth C Anderson
Jeffrey T Chang
Andrea H Bild
Simplifying the development of portable, scalable, and reproducible workflows
eLife
computational workflows
research reproducibility
learn by example
Web application
Common Workflow Language
command-line software
title Simplifying the development of portable, scalable, and reproducible workflows
title_full Simplifying the development of portable, scalable, and reproducible workflows
title_fullStr Simplifying the development of portable, scalable, and reproducible workflows
title_full_unstemmed Simplifying the development of portable, scalable, and reproducible workflows
title_short Simplifying the development of portable, scalable, and reproducible workflows
title_sort simplifying the development of portable scalable and reproducible workflows
topic computational workflows
research reproducibility
learn by example
Web application
Common Workflow Language
command-line software
url https://elifesciences.org/articles/71069
work_keys_str_mv AT stephenrpiccolo simplifyingthedevelopmentofportablescalableandreproducibleworkflows
AT zacharyeence simplifyingthedevelopmentofportablescalableandreproducibleworkflows
AT elizabethcanderson simplifyingthedevelopmentofportablescalableandreproducibleworkflows
AT jeffreytchang simplifyingthedevelopmentofportablescalableandreproducibleworkflows
AT andreahbild simplifyingthedevelopmentofportablescalableandreproducibleworkflows