Simplifying the development of portable, scalable, and reproducible workflows
Command-line software plays a critical role in biology research. However, processes for installing and executing software differ widely. The Common Workflow Language (CWL) is a community standard that addresses this problem. Using CWL, tool developers can formally describe a tool’s inputs, outputs,...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
eLife Sciences Publications Ltd
2021-10-01
|
Series: | eLife |
Subjects: | |
Online Access: | https://elifesciences.org/articles/71069 |
_version_ | 1828375566095482880 |
---|---|
author | Stephen R Piccolo Zachary E Ence Elizabeth C Anderson Jeffrey T Chang Andrea H Bild |
author_facet | Stephen R Piccolo Zachary E Ence Elizabeth C Anderson Jeffrey T Chang Andrea H Bild |
author_sort | Stephen R Piccolo |
collection | DOAJ |
description | Command-line software plays a critical role in biology research. However, processes for installing and executing software differ widely. The Common Workflow Language (CWL) is a community standard that addresses this problem. Using CWL, tool developers can formally describe a tool’s inputs, outputs, and other execution details. CWL documents can include instructions for executing tools inside software containers. Accordingly, CWL tools are portable—they can be executed on diverse computers—including personal workstations, high-performance clusters, or the cloud. CWL also supports workflows, which describe dependencies among tools and using outputs from one tool as inputs to others. To date, CWL has been used primarily for batch processing of large datasets, especially in genomics. But it can also be used for analytical steps of a study. This article explains key concepts about CWL and software containers and provides examples for using CWL in biology research. CWL documents are text-based, so they can be created manually, without computer programming. However, ensuring that these documents conform to the CWL specification may prevent some users from adopting it. To address this gap, we created ToolJig, a Web application that enables researchers to create CWL documents interactively. ToolJig validates information provided by the user to ensure it is complete and valid. After creating a CWL tool or workflow, the user can create ‘input-object’ files, which store values for a particular invocation of a tool or workflow. In addition, ToolJig provides examples of how to execute the tool or workflow via a workflow engine. ToolJig and our examples are available at https://github.com/srp33/ToolJig. |
first_indexed | 2024-04-14T07:48:17Z |
format | Article |
id | doaj.art-c9c3d5bd1f8c4b3c931c442ad9e468a1 |
institution | Directory Open Access Journal |
issn | 2050-084X |
language | English |
last_indexed | 2024-04-14T07:48:17Z |
publishDate | 2021-10-01 |
publisher | eLife Sciences Publications Ltd |
record_format | Article |
series | eLife |
spelling | doaj.art-c9c3d5bd1f8c4b3c931c442ad9e468a12022-12-22T02:05:16ZengeLife Sciences Publications LtdeLife2050-084X2021-10-011010.7554/eLife.71069Simplifying the development of portable, scalable, and reproducible workflowsStephen R Piccolo0https://orcid.org/0000-0003-2001-5640Zachary E Ence1Elizabeth C Anderson2Jeffrey T Chang3Andrea H Bild4Department of Biology, Brigham Young University, Provo, United StatesDepartment of Biology, Brigham Young University, Provo, United StatesDepartment of Biology, Brigham Young University, Provo, United StatesDepartment of Integrative Biology and Pharmacology, University of Texas Health Science Center at Houston, Houston, United StatesDepartment of Medical Oncology and Therapeutics, City of Hope Comprehensive Cancer Institute, Monrovia, United StatesCommand-line software plays a critical role in biology research. However, processes for installing and executing software differ widely. The Common Workflow Language (CWL) is a community standard that addresses this problem. Using CWL, tool developers can formally describe a tool’s inputs, outputs, and other execution details. CWL documents can include instructions for executing tools inside software containers. Accordingly, CWL tools are portable—they can be executed on diverse computers—including personal workstations, high-performance clusters, or the cloud. CWL also supports workflows, which describe dependencies among tools and using outputs from one tool as inputs to others. To date, CWL has been used primarily for batch processing of large datasets, especially in genomics. But it can also be used for analytical steps of a study. This article explains key concepts about CWL and software containers and provides examples for using CWL in biology research. CWL documents are text-based, so they can be created manually, without computer programming. However, ensuring that these documents conform to the CWL specification may prevent some users from adopting it. To address this gap, we created ToolJig, a Web application that enables researchers to create CWL documents interactively. ToolJig validates information provided by the user to ensure it is complete and valid. After creating a CWL tool or workflow, the user can create ‘input-object’ files, which store values for a particular invocation of a tool or workflow. In addition, ToolJig provides examples of how to execute the tool or workflow via a workflow engine. ToolJig and our examples are available at https://github.com/srp33/ToolJig.https://elifesciences.org/articles/71069computational workflowsresearch reproducibilitylearn by exampleWeb applicationCommon Workflow Languagecommand-line software |
spellingShingle | Stephen R Piccolo Zachary E Ence Elizabeth C Anderson Jeffrey T Chang Andrea H Bild Simplifying the development of portable, scalable, and reproducible workflows eLife computational workflows research reproducibility learn by example Web application Common Workflow Language command-line software |
title | Simplifying the development of portable, scalable, and reproducible workflows |
title_full | Simplifying the development of portable, scalable, and reproducible workflows |
title_fullStr | Simplifying the development of portable, scalable, and reproducible workflows |
title_full_unstemmed | Simplifying the development of portable, scalable, and reproducible workflows |
title_short | Simplifying the development of portable, scalable, and reproducible workflows |
title_sort | simplifying the development of portable scalable and reproducible workflows |
topic | computational workflows research reproducibility learn by example Web application Common Workflow Language command-line software |
url | https://elifesciences.org/articles/71069 |
work_keys_str_mv | AT stephenrpiccolo simplifyingthedevelopmentofportablescalableandreproducibleworkflows AT zacharyeence simplifyingthedevelopmentofportablescalableandreproducibleworkflows AT elizabethcanderson simplifyingthedevelopmentofportablescalableandreproducibleworkflows AT jeffreytchang simplifyingthedevelopmentofportablescalableandreproducibleworkflows AT andreahbild simplifyingthedevelopmentofportablescalableandreproducibleworkflows |