Managing genomic variant calling workflows with Swift/T.

Bioinformatics research is frequently performed using complex workflows with multiple steps, fans, merges, and conditionals. This complexity makes management of the workflow difficult on a computer cluster, especially when running in parallel on large batches of data: hundreds or thousands of sample...

Full description

Bibliographic Details
Main Authors: Azza E Ahmed, Jacob Heldenbrand, Yan Asmann, Faisal M Fadlelmola, Daniel S Katz, Katherine Kendig, Matthew C Kendzior, Tiffany Li, Yingxue Ren, Elliott Rodriguez, Matthew R Weber, Justin M Wozniak, Jennie Zermeno, Liudmila S Mainzer
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2019-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0211608
_version_ 1818442533556977664
author Azza E Ahmed
Jacob Heldenbrand
Yan Asmann
Faisal M Fadlelmola
Daniel S Katz
Katherine Kendig
Matthew C Kendzior
Tiffany Li
Yingxue Ren
Elliott Rodriguez
Matthew R Weber
Justin M Wozniak
Jennie Zermeno
Liudmila S Mainzer
author_facet Azza E Ahmed
Jacob Heldenbrand
Yan Asmann
Faisal M Fadlelmola
Daniel S Katz
Katherine Kendig
Matthew C Kendzior
Tiffany Li
Yingxue Ren
Elliott Rodriguez
Matthew R Weber
Justin M Wozniak
Jennie Zermeno
Liudmila S Mainzer
author_sort Azza E Ahmed
collection DOAJ
description Bioinformatics research is frequently performed using complex workflows with multiple steps, fans, merges, and conditionals. This complexity makes management of the workflow difficult on a computer cluster, especially when running in parallel on large batches of data: hundreds or thousands of samples at a time. Scientific workflow management systems could help with that. Many are now being proposed, but is there yet the "best" workflow management system for bioinformatics? Such a system would need to satisfy numerous, sometimes conflicting requirements: from ease of use, to seamless deployment at peta- and exa-scale, and portability to the cloud. We evaluated Swift/T as a candidate for such role by implementing a primary genomic variant calling workflow in the Swift/T language, focusing on workflow management, performance and scalability issues that arise from production-grade big data genomic analyses. In the process we introduced novel features into the language, which are now part of its open repository. Additionally, we formalized a set of design criteria for quality, robust, maintainable workflows that must function at-scale in a production setting, such as a large genomic sequencing facility or a major hospital system. The use of Swift/T conveys two key advantages. (1) It operates transparently in multiple cluster scheduling environments (PBS Torque, SLURM, Cray aprun environment, etc.), thus a single workflow is trivially portable across numerous clusters. (2) The leaf functions of Swift/T permit developers to easily swap executables in and out of the workflow, which makes it easy to maintain and to request resources optimal for each stage of the pipeline. While Swift/T's data-level parallelism eliminates the need to code parallel analysis of multiple samples, it does make debugging more difficult, as is common for implicitly parallel code. Nonetheless, the language gives users a powerful and portable way to scale up analyses in many computing architectures. The code for our implementation of a variant calling workflow using Swift/T can be found on GitHub at https://github.com/ncsa/Swift-T-Variant-Calling, with full documentation provided at http://swift-t-variant-calling.readthedocs.io/en/latest/.
first_indexed 2024-12-14T18:45:40Z
format Article
id doaj.art-1e09fc94f6494e458150290d4c0cf43d
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-14T18:45:40Z
publishDate 2019-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-1e09fc94f6494e458150290d4c0cf43d2022-12-21T22:51:23ZengPublic Library of Science (PLoS)PLoS ONE1932-62032019-01-01147e021160810.1371/journal.pone.0211608Managing genomic variant calling workflows with Swift/T.Azza E AhmedJacob HeldenbrandYan AsmannFaisal M FadlelmolaDaniel S KatzKatherine KendigMatthew C KendziorTiffany LiYingxue RenElliott RodriguezMatthew R WeberJustin M WozniakJennie ZermenoLiudmila S MainzerBioinformatics research is frequently performed using complex workflows with multiple steps, fans, merges, and conditionals. This complexity makes management of the workflow difficult on a computer cluster, especially when running in parallel on large batches of data: hundreds or thousands of samples at a time. Scientific workflow management systems could help with that. Many are now being proposed, but is there yet the "best" workflow management system for bioinformatics? Such a system would need to satisfy numerous, sometimes conflicting requirements: from ease of use, to seamless deployment at peta- and exa-scale, and portability to the cloud. We evaluated Swift/T as a candidate for such role by implementing a primary genomic variant calling workflow in the Swift/T language, focusing on workflow management, performance and scalability issues that arise from production-grade big data genomic analyses. In the process we introduced novel features into the language, which are now part of its open repository. Additionally, we formalized a set of design criteria for quality, robust, maintainable workflows that must function at-scale in a production setting, such as a large genomic sequencing facility or a major hospital system. The use of Swift/T conveys two key advantages. (1) It operates transparently in multiple cluster scheduling environments (PBS Torque, SLURM, Cray aprun environment, etc.), thus a single workflow is trivially portable across numerous clusters. (2) The leaf functions of Swift/T permit developers to easily swap executables in and out of the workflow, which makes it easy to maintain and to request resources optimal for each stage of the pipeline. While Swift/T's data-level parallelism eliminates the need to code parallel analysis of multiple samples, it does make debugging more difficult, as is common for implicitly parallel code. Nonetheless, the language gives users a powerful and portable way to scale up analyses in many computing architectures. The code for our implementation of a variant calling workflow using Swift/T can be found on GitHub at https://github.com/ncsa/Swift-T-Variant-Calling, with full documentation provided at http://swift-t-variant-calling.readthedocs.io/en/latest/.https://doi.org/10.1371/journal.pone.0211608
spellingShingle Azza E Ahmed
Jacob Heldenbrand
Yan Asmann
Faisal M Fadlelmola
Daniel S Katz
Katherine Kendig
Matthew C Kendzior
Tiffany Li
Yingxue Ren
Elliott Rodriguez
Matthew R Weber
Justin M Wozniak
Jennie Zermeno
Liudmila S Mainzer
Managing genomic variant calling workflows with Swift/T.
PLoS ONE
title Managing genomic variant calling workflows with Swift/T.
title_full Managing genomic variant calling workflows with Swift/T.
title_fullStr Managing genomic variant calling workflows with Swift/T.
title_full_unstemmed Managing genomic variant calling workflows with Swift/T.
title_short Managing genomic variant calling workflows with Swift/T.
title_sort managing genomic variant calling workflows with swift t
url https://doi.org/10.1371/journal.pone.0211608
work_keys_str_mv AT azzaeahmed managinggenomicvariantcallingworkflowswithswiftt
AT jacobheldenbrand managinggenomicvariantcallingworkflowswithswiftt
AT yanasmann managinggenomicvariantcallingworkflowswithswiftt
AT faisalmfadlelmola managinggenomicvariantcallingworkflowswithswiftt
AT danielskatz managinggenomicvariantcallingworkflowswithswiftt
AT katherinekendig managinggenomicvariantcallingworkflowswithswiftt
AT matthewckendzior managinggenomicvariantcallingworkflowswithswiftt
AT tiffanyli managinggenomicvariantcallingworkflowswithswiftt
AT yingxueren managinggenomicvariantcallingworkflowswithswiftt
AT elliottrodriguez managinggenomicvariantcallingworkflowswithswiftt
AT matthewrweber managinggenomicvariantcallingworkflowswithswiftt
AT justinmwozniak managinggenomicvariantcallingworkflowswithswiftt
AT jenniezermeno managinggenomicvariantcallingworkflowswithswiftt
AT liudmilasmainzer managinggenomicvariantcallingworkflowswithswiftt