Microbase2.0: A Generic Framework for Computationally Intensive Bioinformatics Workflows in the Cloud
As bioinformatics datasets grow ever larger, and analyses become increasingly complex, there is a need for data handling infrastructures to keep pace with developing technology. One solution is to apply Grid and Cloud technologies to address the computational requirements of analysing high throughpu...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
De Gruyter
2012-06-01
|
Series: | Journal of Integrative Bioinformatics |
Online Access: | https://doi.org/10.1515/jib-2012-212 |
_version_ | 1818587347145457664 |
---|---|
author | Flanagan Keith Nakjang Sirintra Hallinan Jennifer Harwood Colin Hirt Robert P. Pocock Matthew R. Wipat Anil |
author_facet | Flanagan Keith Nakjang Sirintra Hallinan Jennifer Harwood Colin Hirt Robert P. Pocock Matthew R. Wipat Anil |
author_sort | Flanagan Keith |
collection | DOAJ |
description | As bioinformatics datasets grow ever larger, and analyses become increasingly complex, there is a need for data handling infrastructures to keep pace with developing technology. One solution is to apply Grid and Cloud technologies to address the computational requirements of analysing high throughput datasets. We present an approach for writing new, or wrapping existing applications, and a reference implementation of a framework, Microbase2.0, for executing those applications using Grid and Cloud technologies. We used Microbase2.0 to develop an automated Cloud-based bioinformatics workflow executing simultaneously on two different Amazon EC2 data centres and the Newcastle University Condor Grid. Several CPU years’ worth of computational work was performed by this system in less than two months. The workflow produced a detailed dataset characterising the cellular localisation of 3,021,490 proteins from 867 taxa, including bacteria, archaea and unicellular eukaryotes. Microbase2.0 is freely available from http://www.microbase.org.uk/. |
first_indexed | 2024-12-16T09:07:25Z |
format | Article |
id | doaj.art-24df8a0215844e27a0ceb1ebfb401171 |
institution | Directory Open Access Journal |
issn | 1613-4516 |
language | English |
last_indexed | 2024-12-16T09:07:25Z |
publishDate | 2012-06-01 |
publisher | De Gruyter |
record_format | Article |
series | Journal of Integrative Bioinformatics |
spelling | doaj.art-24df8a0215844e27a0ceb1ebfb4011712022-12-21T22:37:03ZengDe GruyterJournal of Integrative Bioinformatics1613-45162012-06-019210111210.1515/jib-2012-212biecoll-jib-2012-212Microbase2.0: A Generic Framework for Computationally Intensive Bioinformatics Workflows in the CloudFlanagan Keith0Nakjang Sirintra1Hallinan Jennifer2Harwood Colin3Hirt Robert P.4Pocock Matthew R.5Wipat Anil6School of Computing Science, United Kingdom of Great Britain and Northern IrelandSchool of Computing Science United Kingdom of Great Britain and Northern IrelandSchool of Computing Science, United Kingdom of Great Britain and Northern IrelandInstitute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne, NE7 4RU, United Kingdom of Great Britain and Northern IrelandInstitute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne, NE7 4RU, United Kingdom of Great Britain and Northern IrelandSchool of Computing Science, United Kingdom of Great Britain and Northern IrelandSchool of Computing Science, United Kingdom of Great Britain and Northern IrelandAs bioinformatics datasets grow ever larger, and analyses become increasingly complex, there is a need for data handling infrastructures to keep pace with developing technology. One solution is to apply Grid and Cloud technologies to address the computational requirements of analysing high throughput datasets. We present an approach for writing new, or wrapping existing applications, and a reference implementation of a framework, Microbase2.0, for executing those applications using Grid and Cloud technologies. We used Microbase2.0 to develop an automated Cloud-based bioinformatics workflow executing simultaneously on two different Amazon EC2 data centres and the Newcastle University Condor Grid. Several CPU years’ worth of computational work was performed by this system in less than two months. The workflow produced a detailed dataset characterising the cellular localisation of 3,021,490 proteins from 867 taxa, including bacteria, archaea and unicellular eukaryotes. Microbase2.0 is freely available from http://www.microbase.org.uk/.https://doi.org/10.1515/jib-2012-212 |
spellingShingle | Flanagan Keith Nakjang Sirintra Hallinan Jennifer Harwood Colin Hirt Robert P. Pocock Matthew R. Wipat Anil Microbase2.0: A Generic Framework for Computationally Intensive Bioinformatics Workflows in the Cloud Journal of Integrative Bioinformatics |
title | Microbase2.0: A Generic Framework for Computationally Intensive Bioinformatics Workflows in the Cloud |
title_full | Microbase2.0: A Generic Framework for Computationally Intensive Bioinformatics Workflows in the Cloud |
title_fullStr | Microbase2.0: A Generic Framework for Computationally Intensive Bioinformatics Workflows in the Cloud |
title_full_unstemmed | Microbase2.0: A Generic Framework for Computationally Intensive Bioinformatics Workflows in the Cloud |
title_short | Microbase2.0: A Generic Framework for Computationally Intensive Bioinformatics Workflows in the Cloud |
title_sort | microbase2 0 a generic framework for computationally intensive bioinformatics workflows in the cloud |
url | https://doi.org/10.1515/jib-2012-212 |
work_keys_str_mv | AT flanagankeith microbase20agenericframeworkforcomputationallyintensivebioinformaticsworkflowsinthecloud AT nakjangsirintra microbase20agenericframeworkforcomputationallyintensivebioinformaticsworkflowsinthecloud AT hallinanjennifer microbase20agenericframeworkforcomputationallyintensivebioinformaticsworkflowsinthecloud AT harwoodcolin microbase20agenericframeworkforcomputationallyintensivebioinformaticsworkflowsinthecloud AT hirtrobertp microbase20agenericframeworkforcomputationallyintensivebioinformaticsworkflowsinthecloud AT pocockmatthewr microbase20agenericframeworkforcomputationallyintensivebioinformaticsworkflowsinthecloud AT wipatanil microbase20agenericframeworkforcomputationallyintensivebioinformaticsworkflowsinthecloud |