Splice site identification using probabilistic parameters and SVM classification

<p>Abstract</p> <p>Background</p> <p>Recent advances and automation in DNA sequencing technology has created a vast amount of DNA sequence data. This increasing growth of sequence data demands better and efficient analysis methods. Identifying genes in this newly accumu...

Full description

Bibliographic Details
Main Authors: Halgamuge SK, Chang BCH, Baten AKMA, Li Jason
Format: Article
Language:English
Published: BMC 2006-12-01
Series:BMC Bioinformatics
Online Access:http://dx.doi.org/10.1186/1471-2105-7-S5-S15
_version_ 1818751148433080320
author Halgamuge SK
Chang BCH
Baten AKMA
Li Jason
author_facet Halgamuge SK
Chang BCH
Baten AKMA
Li Jason
author_sort Halgamuge SK
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Recent advances and automation in DNA sequencing technology has created a vast amount of DNA sequence data. This increasing growth of sequence data demands better and efficient analysis methods. Identifying genes in this newly accumulated data is an important issue in bioinformatics, and it requires the prediction of the complete gene structure. Accurate identification of splice sites in DNA sequences plays one of the central roles of gene structural prediction in eukaryotes. Effective detection of splice sites requires the knowledge of characteristics, dependencies, and relationship of nucleotides in the splice site surrounding region. A higher-order Markov model is generally regarded as a useful technique for modeling higher-order dependencies. However, their implementation requires estimating a large number of parameters, which is computationally expensive.</p> <p>Results</p> <p>The proposed method for splice site detection consists of two stages: a first order Markov model (MM1) is used in the first stage and a support vector machine (SVM) with polynomial kernel is used in the second stage. The MM1 serves as a pre-processing step for the SVM and takes DNA sequences as its input. It models the compositional features and dependencies of nucleotides in terms of probabilistic parameters around splice site regions. The probabilistic parameters are then fed into the SVM, which combines them nonlinearly to predict splice sites. When the proposed MM1-SVM model is compared with other existing standard splice site detection methods, it shows a superior performance in all the cases.</p> <p>Conclusion</p> <p>We proposed an effective pre-processing scheme for the SVM and applied it for the identification of splice sites. This is a simple yet effective splice site detection method, which shows a better classification accuracy and computational speed than some other more complex methods.</p>
first_indexed 2024-12-18T04:30:58Z
format Article
id doaj.art-ad2af064bccc4bc3a6181477162f115a
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-18T04:30:58Z
publishDate 2006-12-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-ad2af064bccc4bc3a6181477162f115a2022-12-21T21:20:59ZengBMCBMC Bioinformatics1471-21052006-12-017Suppl 5S1510.1186/1471-2105-7-S5-S15Splice site identification using probabilistic parameters and SVM classificationHalgamuge SKChang BCHBaten AKMALi Jason<p>Abstract</p> <p>Background</p> <p>Recent advances and automation in DNA sequencing technology has created a vast amount of DNA sequence data. This increasing growth of sequence data demands better and efficient analysis methods. Identifying genes in this newly accumulated data is an important issue in bioinformatics, and it requires the prediction of the complete gene structure. Accurate identification of splice sites in DNA sequences plays one of the central roles of gene structural prediction in eukaryotes. Effective detection of splice sites requires the knowledge of characteristics, dependencies, and relationship of nucleotides in the splice site surrounding region. A higher-order Markov model is generally regarded as a useful technique for modeling higher-order dependencies. However, their implementation requires estimating a large number of parameters, which is computationally expensive.</p> <p>Results</p> <p>The proposed method for splice site detection consists of two stages: a first order Markov model (MM1) is used in the first stage and a support vector machine (SVM) with polynomial kernel is used in the second stage. The MM1 serves as a pre-processing step for the SVM and takes DNA sequences as its input. It models the compositional features and dependencies of nucleotides in terms of probabilistic parameters around splice site regions. The probabilistic parameters are then fed into the SVM, which combines them nonlinearly to predict splice sites. When the proposed MM1-SVM model is compared with other existing standard splice site detection methods, it shows a superior performance in all the cases.</p> <p>Conclusion</p> <p>We proposed an effective pre-processing scheme for the SVM and applied it for the identification of splice sites. This is a simple yet effective splice site detection method, which shows a better classification accuracy and computational speed than some other more complex methods.</p>http://dx.doi.org/10.1186/1471-2105-7-S5-S15
spellingShingle Halgamuge SK
Chang BCH
Baten AKMA
Li Jason
Splice site identification using probabilistic parameters and SVM classification
BMC Bioinformatics
title Splice site identification using probabilistic parameters and SVM classification
title_full Splice site identification using probabilistic parameters and SVM classification
title_fullStr Splice site identification using probabilistic parameters and SVM classification
title_full_unstemmed Splice site identification using probabilistic parameters and SVM classification
title_short Splice site identification using probabilistic parameters and SVM classification
title_sort splice site identification using probabilistic parameters and svm classification
url http://dx.doi.org/10.1186/1471-2105-7-S5-S15
work_keys_str_mv AT halgamugesk splicesiteidentificationusingprobabilisticparametersandsvmclassification
AT changbch splicesiteidentificationusingprobabilisticparametersandsvmclassification
AT batenakma splicesiteidentificationusingprobabilisticparametersandsvmclassification
AT lijason splicesiteidentificationusingprobabilisticparametersandsvmclassification