Microbase2.0: A Generic Framework for Computationally Intensive Bioinformatics Workflows in the Cloud

As bioinformatics datasets grow ever larger, and analyses become increasingly complex, there is a need for data handling infrastructures to keep pace with developing technology. One solution is to apply Grid and Cloud technologies to address the computational requirements of analysing high throughpu...

Full description

Bibliographic Details
Main Authors: Flanagan Keith, Nakjang Sirintra, Hallinan Jennifer, Harwood Colin, Hirt Robert P., Pocock Matthew R., Wipat Anil
Format: Article
Language:English
Published: De Gruyter 2012-06-01
Series:Journal of Integrative Bioinformatics
Online Access:https://doi.org/10.1515/jib-2012-212
_version_ 1818587347145457664
author Flanagan Keith
Nakjang Sirintra
Hallinan Jennifer
Harwood Colin
Hirt Robert P.
Pocock Matthew R.
Wipat Anil
author_facet Flanagan Keith
Nakjang Sirintra
Hallinan Jennifer
Harwood Colin
Hirt Robert P.
Pocock Matthew R.
Wipat Anil
author_sort Flanagan Keith
collection DOAJ
description As bioinformatics datasets grow ever larger, and analyses become increasingly complex, there is a need for data handling infrastructures to keep pace with developing technology. One solution is to apply Grid and Cloud technologies to address the computational requirements of analysing high throughput datasets. We present an approach for writing new, or wrapping existing applications, and a reference implementation of a framework, Microbase2.0, for executing those applications using Grid and Cloud technologies. We used Microbase2.0 to develop an automated Cloud-based bioinformatics workflow executing simultaneously on two different Amazon EC2 data centres and the Newcastle University Condor Grid. Several CPU years’ worth of computational work was performed by this system in less than two months. The workflow produced a detailed dataset characterising the cellular localisation of 3,021,490 proteins from 867 taxa, including bacteria, archaea and unicellular eukaryotes. Microbase2.0 is freely available from http://www.microbase.org.uk/.
first_indexed 2024-12-16T09:07:25Z
format Article
id doaj.art-24df8a0215844e27a0ceb1ebfb401171
institution Directory Open Access Journal
issn 1613-4516
language English
last_indexed 2024-12-16T09:07:25Z
publishDate 2012-06-01
publisher De Gruyter
record_format Article
series Journal of Integrative Bioinformatics
spelling doaj.art-24df8a0215844e27a0ceb1ebfb4011712022-12-21T22:37:03ZengDe GruyterJournal of Integrative Bioinformatics1613-45162012-06-019210111210.1515/jib-2012-212biecoll-jib-2012-212Microbase2.0: A Generic Framework for Computationally Intensive Bioinformatics Workflows in the CloudFlanagan Keith0Nakjang Sirintra1Hallinan Jennifer2Harwood Colin3Hirt Robert P.4Pocock Matthew R.5Wipat Anil6School of Computing Science, United Kingdom of Great Britain and Northern IrelandSchool of Computing Science United Kingdom of Great Britain and Northern IrelandSchool of Computing Science, United Kingdom of Great Britain and Northern IrelandInstitute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne, NE7 4RU, United Kingdom of Great Britain and Northern IrelandInstitute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne, NE7 4RU, United Kingdom of Great Britain and Northern IrelandSchool of Computing Science, United Kingdom of Great Britain and Northern IrelandSchool of Computing Science, United Kingdom of Great Britain and Northern IrelandAs bioinformatics datasets grow ever larger, and analyses become increasingly complex, there is a need for data handling infrastructures to keep pace with developing technology. One solution is to apply Grid and Cloud technologies to address the computational requirements of analysing high throughput datasets. We present an approach for writing new, or wrapping existing applications, and a reference implementation of a framework, Microbase2.0, for executing those applications using Grid and Cloud technologies. We used Microbase2.0 to develop an automated Cloud-based bioinformatics workflow executing simultaneously on two different Amazon EC2 data centres and the Newcastle University Condor Grid. Several CPU years’ worth of computational work was performed by this system in less than two months. The workflow produced a detailed dataset characterising the cellular localisation of 3,021,490 proteins from 867 taxa, including bacteria, archaea and unicellular eukaryotes. Microbase2.0 is freely available from http://www.microbase.org.uk/.https://doi.org/10.1515/jib-2012-212
spellingShingle Flanagan Keith
Nakjang Sirintra
Hallinan Jennifer
Harwood Colin
Hirt Robert P.
Pocock Matthew R.
Wipat Anil
Microbase2.0: A Generic Framework for Computationally Intensive Bioinformatics Workflows in the Cloud
Journal of Integrative Bioinformatics
title Microbase2.0: A Generic Framework for Computationally Intensive Bioinformatics Workflows in the Cloud
title_full Microbase2.0: A Generic Framework for Computationally Intensive Bioinformatics Workflows in the Cloud
title_fullStr Microbase2.0: A Generic Framework for Computationally Intensive Bioinformatics Workflows in the Cloud
title_full_unstemmed Microbase2.0: A Generic Framework for Computationally Intensive Bioinformatics Workflows in the Cloud
title_short Microbase2.0: A Generic Framework for Computationally Intensive Bioinformatics Workflows in the Cloud
title_sort microbase2 0 a generic framework for computationally intensive bioinformatics workflows in the cloud
url https://doi.org/10.1515/jib-2012-212
work_keys_str_mv AT flanagankeith microbase20agenericframeworkforcomputationallyintensivebioinformaticsworkflowsinthecloud
AT nakjangsirintra microbase20agenericframeworkforcomputationallyintensivebioinformaticsworkflowsinthecloud
AT hallinanjennifer microbase20agenericframeworkforcomputationallyintensivebioinformaticsworkflowsinthecloud
AT harwoodcolin microbase20agenericframeworkforcomputationallyintensivebioinformaticsworkflowsinthecloud
AT hirtrobertp microbase20agenericframeworkforcomputationallyintensivebioinformaticsworkflowsinthecloud
AT pocockmatthewr microbase20agenericframeworkforcomputationallyintensivebioinformaticsworkflowsinthecloud
AT wipatanil microbase20agenericframeworkforcomputationallyintensivebioinformaticsworkflowsinthecloud