Multithreaded comparative RNA secondary structure prediction using stochastic context-free grammars

<p>Abstract</p> <p>Background</p> <p>The prediction of the structure of large RNAs remains a particular challenge in bioinformatics, due to the computational complexity and low levels of accuracy of state-of-the-art algorithms. The <it>pfold </it>model coupl...

Full description

Bibliographic Details
Main Authors: Værum Morten, Knudsen Bjarne, Sükösd Zsuzsanna, Kjems Jørgen, Andersen Ebbe S
Format: Article
Language:English
Published: BMC 2011-04-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/12/103
Description
Summary:<p>Abstract</p> <p>Background</p> <p>The prediction of the structure of large RNAs remains a particular challenge in bioinformatics, due to the computational complexity and low levels of accuracy of state-of-the-art algorithms. The <it>pfold </it>model couples a stochastic context-free grammar to phylogenetic analysis for a high accuracy in predictions, but the time complexity of the algorithm and underflow errors have prevented its use for long alignments. Here we present <it>PPfold</it>, a multithreaded version of <it>pfold</it>, which is capable of predicting the structure of large RNA alignments accurately on practical timescales.</p> <p>Results</p> <p>We have distributed both the phylogenetic calculations and the inside-outside algorithm in <it>PPfold</it>, resulting in a significant reduction of runtime on multicore machines. We have addressed the floating-point underflow problems of <it>pfold </it>by implementing an extended-exponent datatype, enabling <it>PPfold </it>to be used for large-scale RNA structure predictions. We have also improved the user interface and portability: alongside standalone executable and Java source code of the program, <it>PPfold </it>is also available as a free plugin to the CLC Workbenches. We have evaluated the accuracy of <it>PPfold </it>using BRaliBase I tests, and demonstrated its practical use by predicting the secondary structure of an alignment of 24 complete HIV-1 genomes in 65 minutes on an 8-core machine and identifying several known structural elements in the prediction.</p> <p>Conclusions</p> <p><it>PPfold </it>is the first parallelized comparative RNA structure prediction algorithm to date. Based on the <it>pfold </it>model, <it>PPfold </it>is capable of fast, high-quality predictions of large RNA secondary structures, such as the genomes of RNA viruses or long genomic transcripts. The techniques used in the parallelization of this algorithm may be of general applicability to other bioinformatics algorithms.</p>
ISSN:1471-2105