Simple statistical models predict C-to-U edited sites in plant mitochondrial RNA

<p>Abstract</p> <p>Background</p> <p>RNA editing is the process whereby an RNA sequence is modified from the sequence of the corresponding DNA template. In the mitochondria of land plants, some cytidines are converted to uridines before translation. Despite substantial...

Full description

Bibliographic Details
Main Authors: Cummings Michael P, Myers Daniel S
Format: Article
Language:English
Published: BMC 2004-09-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/5/132
_version_ 1818141447987134464
author Cummings Michael P
Myers Daniel S
author_facet Cummings Michael P
Myers Daniel S
author_sort Cummings Michael P
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>RNA editing is the process whereby an RNA sequence is modified from the sequence of the corresponding DNA template. In the mitochondria of land plants, some cytidines are converted to uridines before translation. Despite substantial study, the molecular biological mechanism by which C-to-U RNA editing proceeds remains relatively obscure, although several experimental studies have implicated a role for <it>cis</it>-recognition. A highly non-random distribution of nucleotides is observed in the immediate vicinity of edited sites (within 20 nucleotides 5' and 3'), but no precise consensus motif has been identified.</p> <p>Results</p> <p>Data for analysis were derived from the the complete mitochondrial genomes of <it>Arabidopsis thaliana</it>, <it>Brassica napus</it>, and <it>Oryza sativa</it>; additionally, a combined data set of observations across all three genomes was generated. We selected datasets based on the 20 nucleotides 5' and the 20 nucleotides 3' of edited sites and an equivalently sized and appropriately constructed null-set of non-edited sites. We used tree-based statistical methods and random forests to generate models of C-to-U RNA editing based on the nucleotides surrounding the edited/non-edited sites and on the estimated folding energies of those regions. Tree-based statistical methods based on primary sequence data surrounding edited/non-edited sites and estimates of free energy of folding yield models with optimistic re-substitution-based estimates of ~0.71 accuracy, ~0.64 sensitivity, and ~0.88 specificity. Random forest analysis yielded better models and more exact performance estimates with ~0.74 accuracy, ~0.72 sensitivity, and ~0.81 specificity for the combined observations.</p> <p>Conclusions</p> <p>Simple models do moderately well in predicting which cytidines will be edited to uridines, and provide the first quantitative predictive models for RNA edited sites in plant mitochondria. Our analysis shows that the identity of the nucleotide -1 to the edited C and the estimated free energy of folding for a 41 nt region surrounding the edited C are the most important variables that distinguish most edited from non-edited sites. However, the results suggest that primary sequence data and simple free energy of folding calculations alone are insufficient to make highly accurate predictions.</p>
first_indexed 2024-12-11T11:00:02Z
format Article
id doaj.art-d93bad7b6c824a66be6838a64ebacf30
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-11T11:00:02Z
publishDate 2004-09-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-d93bad7b6c824a66be6838a64ebacf302022-12-22T01:09:54ZengBMCBMC Bioinformatics1471-21052004-09-015113210.1186/1471-2105-5-132Simple statistical models predict C-to-U edited sites in plant mitochondrial RNACummings Michael PMyers Daniel S<p>Abstract</p> <p>Background</p> <p>RNA editing is the process whereby an RNA sequence is modified from the sequence of the corresponding DNA template. In the mitochondria of land plants, some cytidines are converted to uridines before translation. Despite substantial study, the molecular biological mechanism by which C-to-U RNA editing proceeds remains relatively obscure, although several experimental studies have implicated a role for <it>cis</it>-recognition. A highly non-random distribution of nucleotides is observed in the immediate vicinity of edited sites (within 20 nucleotides 5' and 3'), but no precise consensus motif has been identified.</p> <p>Results</p> <p>Data for analysis were derived from the the complete mitochondrial genomes of <it>Arabidopsis thaliana</it>, <it>Brassica napus</it>, and <it>Oryza sativa</it>; additionally, a combined data set of observations across all three genomes was generated. We selected datasets based on the 20 nucleotides 5' and the 20 nucleotides 3' of edited sites and an equivalently sized and appropriately constructed null-set of non-edited sites. We used tree-based statistical methods and random forests to generate models of C-to-U RNA editing based on the nucleotides surrounding the edited/non-edited sites and on the estimated folding energies of those regions. Tree-based statistical methods based on primary sequence data surrounding edited/non-edited sites and estimates of free energy of folding yield models with optimistic re-substitution-based estimates of ~0.71 accuracy, ~0.64 sensitivity, and ~0.88 specificity. Random forest analysis yielded better models and more exact performance estimates with ~0.74 accuracy, ~0.72 sensitivity, and ~0.81 specificity for the combined observations.</p> <p>Conclusions</p> <p>Simple models do moderately well in predicting which cytidines will be edited to uridines, and provide the first quantitative predictive models for RNA edited sites in plant mitochondria. Our analysis shows that the identity of the nucleotide -1 to the edited C and the estimated free energy of folding for a 41 nt region surrounding the edited C are the most important variables that distinguish most edited from non-edited sites. However, the results suggest that primary sequence data and simple free energy of folding calculations alone are insufficient to make highly accurate predictions.</p>http://www.biomedcentral.com/1471-2105/5/132
spellingShingle Cummings Michael P
Myers Daniel S
Simple statistical models predict C-to-U edited sites in plant mitochondrial RNA
BMC Bioinformatics
title Simple statistical models predict C-to-U edited sites in plant mitochondrial RNA
title_full Simple statistical models predict C-to-U edited sites in plant mitochondrial RNA
title_fullStr Simple statistical models predict C-to-U edited sites in plant mitochondrial RNA
title_full_unstemmed Simple statistical models predict C-to-U edited sites in plant mitochondrial RNA
title_short Simple statistical models predict C-to-U edited sites in plant mitochondrial RNA
title_sort simple statistical models predict c to u edited sites in plant mitochondrial rna
url http://www.biomedcentral.com/1471-2105/5/132
work_keys_str_mv AT cummingsmichaelp simplestatisticalmodelspredictctoueditedsitesinplantmitochondrialrna
AT myersdaniels simplestatisticalmodelspredictctoueditedsitesinplantmitochondrialrna