BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments

Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing techniques and can benefit from specialized...

Full description

Bibliographic Details
Main Authors: Maria Luiza Mondelli, Thiago Magalhães, Guilherme Loss, Michael Wilde, Ian Foster, Marta Mattoso, Daniel Katz, Helio Barbosa, Ana Tereza R. de Vasconcelos, Kary Ocaña, Luiz M.R. Gadelha Jr
Format: Article
Language:English
Published: PeerJ Inc. 2018-08-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/5551.pdf
_version_ 1797422141397794816
author Maria Luiza Mondelli
Thiago Magalhães
Guilherme Loss
Michael Wilde
Ian Foster
Marta Mattoso
Daniel Katz
Helio Barbosa
Ana Tereza R. de Vasconcelos
Kary Ocaña
Luiz M.R. Gadelha Jr
author_facet Maria Luiza Mondelli
Thiago Magalhães
Guilherme Loss
Michael Wilde
Ian Foster
Marta Mattoso
Daniel Katz
Helio Barbosa
Ana Tereza R. de Vasconcelos
Kary Ocaña
Luiz M.R. Gadelha Jr
author_sort Maria Luiza Mondelli
collection DOAJ
description Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process.
first_indexed 2024-03-09T07:27:55Z
format Article
id doaj.art-1913b3e06e6446db80d4e6a62bb80769
institution Directory Open Access Journal
issn 2167-8359
language English
last_indexed 2024-03-09T07:27:55Z
publishDate 2018-08-01
publisher PeerJ Inc.
record_format Article
series PeerJ
spelling doaj.art-1913b3e06e6446db80d4e6a62bb807692023-12-03T06:47:37ZengPeerJ Inc.PeerJ2167-83592018-08-016e555110.7717/peerj.5551BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experimentsMaria Luiza Mondelli0Thiago Magalhães1Guilherme Loss2Michael Wilde3Ian Foster4Marta Mattoso5Daniel Katz6Helio Barbosa7Ana Tereza R. de Vasconcelos8Kary Ocaña9Luiz M.R. Gadelha Jr10National Laboratory for Scientific Computing, Petrópolis, Rio de Janeiro, BrazilNational Laboratory for Scientific Computing, Petrópolis, Rio de Janeiro, BrazilNational Laboratory for Scientific Computing, Petrópolis, Rio de Janeiro, BrazilComputation Institute, Argonne National Laboratory/University of Chicago, Chicago, IL, USAComputation Institute, Argonne National Laboratory/University of Chicago, Chicago, IL, USAComputer and Systems Engineering Program, COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, BrazilNational Center for Supercomputing Applications, University of Illinois, Urbana, IL, USANational Laboratory for Scientific Computing, Petrópolis, Rio de Janeiro, BrazilNational Laboratory for Scientific Computing, Petrópolis, Rio de Janeiro, BrazilNational Laboratory for Scientific Computing, Petrópolis, Rio de Janeiro, BrazilNational Laboratory for Scientific Computing, Petrópolis, Rio de Janeiro, BrazilAdvances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process.https://peerj.com/articles/5551.pdfBioinformaticsScientific workflowsProvenanceProfilingData analytics
spellingShingle Maria Luiza Mondelli
Thiago Magalhães
Guilherme Loss
Michael Wilde
Ian Foster
Marta Mattoso
Daniel Katz
Helio Barbosa
Ana Tereza R. de Vasconcelos
Kary Ocaña
Luiz M.R. Gadelha Jr
BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments
PeerJ
Bioinformatics
Scientific workflows
Provenance
Profiling
Data analytics
title BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments
title_full BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments
title_fullStr BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments
title_full_unstemmed BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments
title_short BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments
title_sort bioworkbench a high performance framework for managing and analyzing bioinformatics experiments
topic Bioinformatics
Scientific workflows
Provenance
Profiling
Data analytics
url https://peerj.com/articles/5551.pdf
work_keys_str_mv AT marialuizamondelli bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments
AT thiagomagalhaes bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments
AT guilhermeloss bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments
AT michaelwilde bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments
AT ianfoster bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments
AT martamattoso bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments
AT danielkatz bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments
AT heliobarbosa bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments
AT anaterezardevasconcelos bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments
AT karyocana bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments
AT luizmrgadelhajr bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments