eHive: An Artificial Intelligence workflow system for genomic analysis

<p>Abstract</p> <p>Background</p> <p>The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the pro...

Full description

Bibliographic Details
Main Authors: Gordon Leo, Schuster Michael, Fitzgerald Stephen, Vilella Albert J, Beal Kathryn, Severin Jessica, Ureta-Vidal Abel, Flicek Paul, Herrero Javier
Format: Article
Language:English
Published: BMC 2010-05-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/11/240
_version_ 1811296260308598784
author Gordon Leo
Schuster Michael
Fitzgerald Stephen
Vilella Albert J
Beal Kathryn
Severin Jessica
Ureta-Vidal Abel
Flicek Paul
Herrero Javier
author_facet Gordon Leo
Schuster Michael
Fitzgerald Stephen
Vilella Albert J
Beal Kathryn
Severin Jessica
Ureta-Vidal Abel
Flicek Paul
Herrero Javier
author_sort Gordon Leo
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future.</p> <p>Results</p> <p>We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios.</p> <p>Conclusions</p> <p>eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: <url>http://www.ensembl.org/info/docs/eHive/</url>.</p>
first_indexed 2024-04-13T05:45:54Z
format Article
id doaj.art-6558564710b14e748886b1d4fd3f56cc
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-13T05:45:54Z
publishDate 2010-05-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-6558564710b14e748886b1d4fd3f56cc2022-12-22T02:59:57ZengBMCBMC Bioinformatics1471-21052010-05-0111124010.1186/1471-2105-11-240eHive: An Artificial Intelligence workflow system for genomic analysisGordon LeoSchuster MichaelFitzgerald StephenVilella Albert JBeal KathrynSeverin JessicaUreta-Vidal AbelFlicek PaulHerrero Javier<p>Abstract</p> <p>Background</p> <p>The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future.</p> <p>Results</p> <p>We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios.</p> <p>Conclusions</p> <p>eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: <url>http://www.ensembl.org/info/docs/eHive/</url>.</p>http://www.biomedcentral.com/1471-2105/11/240
spellingShingle Gordon Leo
Schuster Michael
Fitzgerald Stephen
Vilella Albert J
Beal Kathryn
Severin Jessica
Ureta-Vidal Abel
Flicek Paul
Herrero Javier
eHive: An Artificial Intelligence workflow system for genomic analysis
BMC Bioinformatics
title eHive: An Artificial Intelligence workflow system for genomic analysis
title_full eHive: An Artificial Intelligence workflow system for genomic analysis
title_fullStr eHive: An Artificial Intelligence workflow system for genomic analysis
title_full_unstemmed eHive: An Artificial Intelligence workflow system for genomic analysis
title_short eHive: An Artificial Intelligence workflow system for genomic analysis
title_sort ehive an artificial intelligence workflow system for genomic analysis
url http://www.biomedcentral.com/1471-2105/11/240
work_keys_str_mv AT gordonleo ehiveanartificialintelligenceworkflowsystemforgenomicanalysis
AT schustermichael ehiveanartificialintelligenceworkflowsystemforgenomicanalysis
AT fitzgeraldstephen ehiveanartificialintelligenceworkflowsystemforgenomicanalysis
AT vilellaalbertj ehiveanartificialintelligenceworkflowsystemforgenomicanalysis
AT bealkathryn ehiveanartificialintelligenceworkflowsystemforgenomicanalysis
AT severinjessica ehiveanartificialintelligenceworkflowsystemforgenomicanalysis
AT uretavidalabel ehiveanartificialintelligenceworkflowsystemforgenomicanalysis
AT flicekpaul ehiveanartificialintelligenceworkflowsystemforgenomicanalysis
AT herrerojavier ehiveanartificialintelligenceworkflowsystemforgenomicanalysis