Alt-Splice Gene Predictor Using Multitrack-Clique Analysis: Verification of Statistical Support for Modelling in Genomes of Multicellular Eukaryotes

One of the main limitations of the typical hidden Markov model (HMM) implementation for gene structure identification is that a single structure is identified on a given sequence of genomic data—i.e., identification of overlapping structure is not directly possible, and certainly not possible within...

Full description

Bibliographic Details
Main Authors: Stephen Winters-Hilt, Andrew J. Lewis
Format: Article
Language:English
Published: MDPI AG 2017-01-01
Series:Informatics
Subjects:
Online Access:http://www.mdpi.com/2227-9709/4/1/3
_version_ 1828255859441926144
author Stephen Winters-Hilt
Andrew J. Lewis
author_facet Stephen Winters-Hilt
Andrew J. Lewis
author_sort Stephen Winters-Hilt
collection DOAJ
description One of the main limitations of the typical hidden Markov model (HMM) implementation for gene structure identification is that a single structure is identified on a given sequence of genomic data—i.e., identification of overlapping structure is not directly possible, and certainly not possible within the confines of the optimal Viterbi path evaluation. This is a huge limitation given that we now know that significant portions of eukaryotic genomes, particularly mammalian genomes, are alternatively spliced, and, thus, have overlapping structure in the sense of the mRNA transcripts that result. Using the general meta-state HMM approach developed in prior work, however, more than one ‘track’ of annotation can be accommodated, thereby allowing a direct implementation of an alternative-splice gene-structure identifier. In this paper we examine the representation of alternative splicing annotation in the multi-track context, and show that the proliferation on states is manageable, and has sufficient statistical support on the genomes examined (human, mouse, worm, and fly) that a full alt-splice meta-state HMM gene finder can be implemented with sufficient statistical support. In the process of performing the alternative splicing analysis on alt-splice event counts we expected to see an increase in alternative splicing complexity as the organism becomes more complex, and this is seen with the percentage of genes with alt-splice variants increasing from worm to fly to the mammalian genomes (mouse and human). Of particular note is an increase in alternative splicing variants at the start and end of coding with the more complex organisms studied (mouse and human), indicating rapid new first and last exon recruitment that is possibly spliceosome mediated. This suggests that spliceosome-mediated refinements (acceleration) of gene structure variation and selection, with increasing levels of sophistication, has occurred in eukaryotes and in mammals especially.
first_indexed 2024-04-13T02:20:34Z
format Article
id doaj.art-cfc17ca9b3434ce9ac1878343823bff5
institution Directory Open Access Journal
issn 2227-9709
language English
last_indexed 2024-04-13T02:20:34Z
publishDate 2017-01-01
publisher MDPI AG
record_format Article
series Informatics
spelling doaj.art-cfc17ca9b3434ce9ac1878343823bff52022-12-22T03:07:01ZengMDPI AGInformatics2227-97092017-01-0141310.3390/informatics4010003informatics4010003Alt-Splice Gene Predictor Using Multitrack-Clique Analysis: Verification of Statistical Support for Modelling in Genomes of Multicellular EukaryotesStephen Winters-Hilt0Andrew J. Lewis1Computer Science Department, Connecticut College, 270 Mohegan Ave., New London, CT 06320, USADepartment of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06268, USAOne of the main limitations of the typical hidden Markov model (HMM) implementation for gene structure identification is that a single structure is identified on a given sequence of genomic data—i.e., identification of overlapping structure is not directly possible, and certainly not possible within the confines of the optimal Viterbi path evaluation. This is a huge limitation given that we now know that significant portions of eukaryotic genomes, particularly mammalian genomes, are alternatively spliced, and, thus, have overlapping structure in the sense of the mRNA transcripts that result. Using the general meta-state HMM approach developed in prior work, however, more than one ‘track’ of annotation can be accommodated, thereby allowing a direct implementation of an alternative-splice gene-structure identifier. In this paper we examine the representation of alternative splicing annotation in the multi-track context, and show that the proliferation on states is manageable, and has sufficient statistical support on the genomes examined (human, mouse, worm, and fly) that a full alt-splice meta-state HMM gene finder can be implemented with sufficient statistical support. In the process of performing the alternative splicing analysis on alt-splice event counts we expected to see an increase in alternative splicing complexity as the organism becomes more complex, and this is seen with the percentage of genes with alt-splice variants increasing from worm to fly to the mammalian genomes (mouse and human). Of particular note is an increase in alternative splicing variants at the start and end of coding with the more complex organisms studied (mouse and human), indicating rapid new first and last exon recruitment that is possibly spliceosome mediated. This suggests that spliceosome-mediated refinements (acceleration) of gene structure variation and selection, with increasing levels of sophistication, has occurred in eukaryotes and in mammals especially.http://www.mdpi.com/2227-9709/4/1/3alternative splicinggene-structure identificationHMM
spellingShingle Stephen Winters-Hilt
Andrew J. Lewis
Alt-Splice Gene Predictor Using Multitrack-Clique Analysis: Verification of Statistical Support for Modelling in Genomes of Multicellular Eukaryotes
Informatics
alternative splicing
gene-structure identification
HMM
title Alt-Splice Gene Predictor Using Multitrack-Clique Analysis: Verification of Statistical Support for Modelling in Genomes of Multicellular Eukaryotes
title_full Alt-Splice Gene Predictor Using Multitrack-Clique Analysis: Verification of Statistical Support for Modelling in Genomes of Multicellular Eukaryotes
title_fullStr Alt-Splice Gene Predictor Using Multitrack-Clique Analysis: Verification of Statistical Support for Modelling in Genomes of Multicellular Eukaryotes
title_full_unstemmed Alt-Splice Gene Predictor Using Multitrack-Clique Analysis: Verification of Statistical Support for Modelling in Genomes of Multicellular Eukaryotes
title_short Alt-Splice Gene Predictor Using Multitrack-Clique Analysis: Verification of Statistical Support for Modelling in Genomes of Multicellular Eukaryotes
title_sort alt splice gene predictor using multitrack clique analysis verification of statistical support for modelling in genomes of multicellular eukaryotes
topic alternative splicing
gene-structure identification
HMM
url http://www.mdpi.com/2227-9709/4/1/3
work_keys_str_mv AT stephenwintershilt altsplicegenepredictorusingmultitrackcliqueanalysisverificationofstatisticalsupportformodellingingenomesofmulticellulareukaryotes
AT andrewjlewis altsplicegenepredictorusingmultitrackcliqueanalysisverificationofstatisticalsupportformodellingingenomesofmulticellulareukaryotes