Dinucleotide controlled null models for comparative RNA gene prediction

<p>Abstract</p> <p>Background</p> <p>Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak <it>et al</it>. [BMC Bioinformatics. 8:33] that RNA gene prediction programs ca...

Full description

Bibliographic Details
Main Authors: Gesell Tanja, Washietl Stefan
Format: Article
Language:English
Published: BMC 2008-05-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/9/248
_version_ 1818390120853667840
author Gesell Tanja
Washietl Stefan
author_facet Gesell Tanja
Washietl Stefan
author_sort Gesell Tanja
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak <it>et al</it>. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available.</p> <p>Results</p> <p>We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content.</p> <p>Conclusion</p> <p>SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered.</p> <p>Availability</p> <p>SISSIz is available as open source C code that can be compiled for every major platform and downloaded here: <url>http://sourceforge.net/projects/sissiz</url>.</p>
first_indexed 2024-12-14T04:52:35Z
format Article
id doaj.art-63ad2d753be749e2bb253d1e738b6f87
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-14T04:52:35Z
publishDate 2008-05-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-63ad2d753be749e2bb253d1e738b6f872022-12-21T23:16:30ZengBMCBMC Bioinformatics1471-21052008-05-019124810.1186/1471-2105-9-248Dinucleotide controlled null models for comparative RNA gene predictionGesell TanjaWashietl Stefan<p>Abstract</p> <p>Background</p> <p>Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak <it>et al</it>. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available.</p> <p>Results</p> <p>We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content.</p> <p>Conclusion</p> <p>SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered.</p> <p>Availability</p> <p>SISSIz is available as open source C code that can be compiled for every major platform and downloaded here: <url>http://sourceforge.net/projects/sissiz</url>.</p>http://www.biomedcentral.com/1471-2105/9/248
spellingShingle Gesell Tanja
Washietl Stefan
Dinucleotide controlled null models for comparative RNA gene prediction
BMC Bioinformatics
title Dinucleotide controlled null models for comparative RNA gene prediction
title_full Dinucleotide controlled null models for comparative RNA gene prediction
title_fullStr Dinucleotide controlled null models for comparative RNA gene prediction
title_full_unstemmed Dinucleotide controlled null models for comparative RNA gene prediction
title_short Dinucleotide controlled null models for comparative RNA gene prediction
title_sort dinucleotide controlled null models for comparative rna gene prediction
url http://www.biomedcentral.com/1471-2105/9/248
work_keys_str_mv AT geselltanja dinucleotidecontrollednullmodelsforcomparativernageneprediction
AT washietlstefan dinucleotidecontrollednullmodelsforcomparativernageneprediction