HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data.
High-throughput sequencing of an organism's transcriptome, or RNA-Seq, is a valuable and versatile new strategy for capturing snapshots of gene expression. However, transcriptome sequencing creates a new class of alignment problem: mapping short reads that span exon-exon junctions back to the r...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2010-11-01
|
Series: | PLoS ONE |
Online Access: | http://europepmc.org/articles/PMC2975632?pdf=render |
_version_ | 1828529359740207104 |
---|---|
author | Michelle T Dimon Katherine Sorber Joseph L DeRisi |
author_facet | Michelle T Dimon Katherine Sorber Joseph L DeRisi |
author_sort | Michelle T Dimon |
collection | DOAJ |
description | High-throughput sequencing of an organism's transcriptome, or RNA-Seq, is a valuable and versatile new strategy for capturing snapshots of gene expression. However, transcriptome sequencing creates a new class of alignment problem: mapping short reads that span exon-exon junctions back to the reference genome, especially in the case where a splice junction is previously unknown.Here we introduce HMMSplicer, an accurate and efficient algorithm for discovering canonical and non-canonical splice junctions in short read datasets. HMMSplicer identifies more splice junctions than currently available algorithms when tested on publicly available A. thaliana, P. falciparum, and H. sapiens datasets without a reduction in specificity.HMMSplicer was found to perform especially well in compact genomes and on genes with low expression levels, alternative splice isoforms, or non-canonical splice junctions. Because HHMSplicer does not rely on pre-built gene models, the products of inexact splicing are also detected. For H. sapiens, we find 3.6% of 3' splice sites and 1.4% of 5' splice sites are inexact, typically differing by 3 bases in either direction. In addition, HMMSplicer provides a score for every predicted junction allowing the user to set a threshold to tune false positive rates depending on the needs of the experiment. HMMSplicer is implemented in Python. Code and documentation are freely available at http://derisilab.ucsf.edu/software/hmmsplicer. |
first_indexed | 2024-12-11T22:06:14Z |
format | Article |
id | doaj.art-b2d50651417146699cf253b6699514cb |
institution | Directory Open Access Journal |
issn | 1932-6203 |
language | English |
last_indexed | 2024-12-11T22:06:14Z |
publishDate | 2010-11-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS ONE |
spelling | doaj.art-b2d50651417146699cf253b6699514cb2022-12-22T00:48:56ZengPublic Library of Science (PLoS)PLoS ONE1932-62032010-11-01511e1387510.1371/journal.pone.0013875HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data.Michelle T DimonKatherine SorberJoseph L DeRisiHigh-throughput sequencing of an organism's transcriptome, or RNA-Seq, is a valuable and versatile new strategy for capturing snapshots of gene expression. However, transcriptome sequencing creates a new class of alignment problem: mapping short reads that span exon-exon junctions back to the reference genome, especially in the case where a splice junction is previously unknown.Here we introduce HMMSplicer, an accurate and efficient algorithm for discovering canonical and non-canonical splice junctions in short read datasets. HMMSplicer identifies more splice junctions than currently available algorithms when tested on publicly available A. thaliana, P. falciparum, and H. sapiens datasets without a reduction in specificity.HMMSplicer was found to perform especially well in compact genomes and on genes with low expression levels, alternative splice isoforms, or non-canonical splice junctions. Because HHMSplicer does not rely on pre-built gene models, the products of inexact splicing are also detected. For H. sapiens, we find 3.6% of 3' splice sites and 1.4% of 5' splice sites are inexact, typically differing by 3 bases in either direction. In addition, HMMSplicer provides a score for every predicted junction allowing the user to set a threshold to tune false positive rates depending on the needs of the experiment. HMMSplicer is implemented in Python. Code and documentation are freely available at http://derisilab.ucsf.edu/software/hmmsplicer.http://europepmc.org/articles/PMC2975632?pdf=render |
spellingShingle | Michelle T Dimon Katherine Sorber Joseph L DeRisi HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data. PLoS ONE |
title | HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data. |
title_full | HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data. |
title_fullStr | HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data. |
title_full_unstemmed | HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data. |
title_short | HMMSplicer: a tool for efficient and sensitive discovery of known and novel splice junctions in RNA-Seq data. |
title_sort | hmmsplicer a tool for efficient and sensitive discovery of known and novel splice junctions in rna seq data |
url | http://europepmc.org/articles/PMC2975632?pdf=render |
work_keys_str_mv | AT michelletdimon hmmspliceratoolforefficientandsensitivediscoveryofknownandnovelsplicejunctionsinrnaseqdata AT katherinesorber hmmspliceratoolforefficientandsensitivediscoveryofknownandnovelsplicejunctionsinrnaseqdata AT josephlderisi hmmspliceratoolforefficientandsensitivediscoveryofknownandnovelsplicejunctionsinrnaseqdata |