Support vector machines-based identification of alternative splicing in <it>Arabidopsis thaliana </it>from whole-genome tiling arrays

<p>Abstract</p> <p>Background</p> <p>Alternative splicing (AS) is a process which generates several distinct mRNA isoforms from the same gene by splicing different portions out of the precursor transcript. Due to the (patho-)physiological importance of AS, a complete in...

Full description

Bibliographic Details
Main Authors: Zeller Georg, Eichner Johannes, Laubinger Sascha, Rätsch Gunnar
Format: Article
Language:English
Published: BMC 2011-02-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/12/55
_version_ 1818789063990181888
author Zeller Georg
Eichner Johannes
Laubinger Sascha
Rätsch Gunnar
author_facet Zeller Georg
Eichner Johannes
Laubinger Sascha
Rätsch Gunnar
author_sort Zeller Georg
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Alternative splicing (AS) is a process which generates several distinct mRNA isoforms from the same gene by splicing different portions out of the precursor transcript. Due to the (patho-)physiological importance of AS, a complete inventory of AS is of great interest. While this is in reach for human and mammalian model organisms, our knowledge of AS in plants has remained more incomplete. Experimental approaches for monitoring AS are either based on transcript sequencing or rely on hybridization to DNA microarrays. Among the microarray platforms facilitating the discovery of AS events, tiling arrays are well-suited for identifying intron retention, the most prevalent type of AS in plants. However, analyzing tiling array data is challenging, because of high noise levels and limited probe coverage.</p> <p>Results</p> <p>In this work, we present a novel method to detect intron retentions (IR) and exon skips (ES) from tiling arrays. While statistical tests have typically been proposed for this purpose, our method instead utilizes support vector machines (SVMs) which are appreciated for their accuracy and robustness to noise. Existing EST and cDNA sequences served for supervised training and evaluation. Analyzing a large collection of publicly available microarray and sequence data for the model plant <it>A. thaliana</it>, we demonstrated that our method is more accurate than existing approaches. The method was applied in a genome-wide screen which resulted in the discovery of 1,355 IR events. A comparison of these IR events to the TAIR annotation and a large set of short-read RNA-seq data showed that 830 of the predicted IR events are novel and that 525 events (39%) overlap with either the TAIR annotation or the IR events inferred from the RNA-seq data.</p> <p>Conclusions</p> <p>The method developed in this work expands the scarce repertoire of analysis tools for the identification of alternative mRNA splicing from whole-genome tiling arrays. Our predictions are highly enriched with known AS events and complement the <it>A. thaliana </it>genome annotation with respect to AS. Since all predicted AS events can be precisely attributed to experimental conditions, our work provides a basis for follow-up studies focused on the elucidation of the regulatory mechanisms underlying tissue-specific and stress-dependent AS in plants.</p>
first_indexed 2024-12-18T14:33:37Z
format Article
id doaj.art-6e6381655e7b40b3b917c7676fbfb367
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-18T14:33:37Z
publishDate 2011-02-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-6e6381655e7b40b3b917c7676fbfb3672022-12-21T21:04:32ZengBMCBMC Bioinformatics1471-21052011-02-011215510.1186/1471-2105-12-55Support vector machines-based identification of alternative splicing in <it>Arabidopsis thaliana </it>from whole-genome tiling arraysZeller GeorgEichner JohannesLaubinger SaschaRätsch Gunnar<p>Abstract</p> <p>Background</p> <p>Alternative splicing (AS) is a process which generates several distinct mRNA isoforms from the same gene by splicing different portions out of the precursor transcript. Due to the (patho-)physiological importance of AS, a complete inventory of AS is of great interest. While this is in reach for human and mammalian model organisms, our knowledge of AS in plants has remained more incomplete. Experimental approaches for monitoring AS are either based on transcript sequencing or rely on hybridization to DNA microarrays. Among the microarray platforms facilitating the discovery of AS events, tiling arrays are well-suited for identifying intron retention, the most prevalent type of AS in plants. However, analyzing tiling array data is challenging, because of high noise levels and limited probe coverage.</p> <p>Results</p> <p>In this work, we present a novel method to detect intron retentions (IR) and exon skips (ES) from tiling arrays. While statistical tests have typically been proposed for this purpose, our method instead utilizes support vector machines (SVMs) which are appreciated for their accuracy and robustness to noise. Existing EST and cDNA sequences served for supervised training and evaluation. Analyzing a large collection of publicly available microarray and sequence data for the model plant <it>A. thaliana</it>, we demonstrated that our method is more accurate than existing approaches. The method was applied in a genome-wide screen which resulted in the discovery of 1,355 IR events. A comparison of these IR events to the TAIR annotation and a large set of short-read RNA-seq data showed that 830 of the predicted IR events are novel and that 525 events (39%) overlap with either the TAIR annotation or the IR events inferred from the RNA-seq data.</p> <p>Conclusions</p> <p>The method developed in this work expands the scarce repertoire of analysis tools for the identification of alternative mRNA splicing from whole-genome tiling arrays. Our predictions are highly enriched with known AS events and complement the <it>A. thaliana </it>genome annotation with respect to AS. Since all predicted AS events can be precisely attributed to experimental conditions, our work provides a basis for follow-up studies focused on the elucidation of the regulatory mechanisms underlying tissue-specific and stress-dependent AS in plants.</p>http://www.biomedcentral.com/1471-2105/12/55
spellingShingle Zeller Georg
Eichner Johannes
Laubinger Sascha
Rätsch Gunnar
Support vector machines-based identification of alternative splicing in <it>Arabidopsis thaliana </it>from whole-genome tiling arrays
BMC Bioinformatics
title Support vector machines-based identification of alternative splicing in <it>Arabidopsis thaliana </it>from whole-genome tiling arrays
title_full Support vector machines-based identification of alternative splicing in <it>Arabidopsis thaliana </it>from whole-genome tiling arrays
title_fullStr Support vector machines-based identification of alternative splicing in <it>Arabidopsis thaliana </it>from whole-genome tiling arrays
title_full_unstemmed Support vector machines-based identification of alternative splicing in <it>Arabidopsis thaliana </it>from whole-genome tiling arrays
title_short Support vector machines-based identification of alternative splicing in <it>Arabidopsis thaliana </it>from whole-genome tiling arrays
title_sort support vector machines based identification of alternative splicing in it arabidopsis thaliana it from whole genome tiling arrays
url http://www.biomedcentral.com/1471-2105/12/55
work_keys_str_mv AT zellergeorg supportvectormachinesbasedidentificationofalternativesplicinginitarabidopsisthalianaitfromwholegenometilingarrays
AT eichnerjohannes supportvectormachinesbasedidentificationofalternativesplicinginitarabidopsisthalianaitfromwholegenometilingarrays
AT laubingersascha supportvectormachinesbasedidentificationofalternativesplicinginitarabidopsisthalianaitfromwholegenometilingarrays
AT ratschgunnar supportvectormachinesbasedidentificationofalternativesplicinginitarabidopsisthalianaitfromwholegenometilingarrays