HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors

<p>Abstract</p> <p>Background</p> <p>Protein domain classification is an important step in metagenomic annotation. The state-of-the-art method for protein domain classification is profile HMM-based alignment. However, the relatively high rates of insertions and deletion...

Full description

Bibliographic Details
Main Authors: Sun Yanni, Zhang Yuan
Format: Article
Language:English
Published: BMC 2011-05-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/12/198
_version_ 1811264003647733760
author Sun Yanni
Zhang Yuan
author_facet Sun Yanni
Zhang Yuan
author_sort Sun Yanni
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Protein domain classification is an important step in metagenomic annotation. The state-of-the-art method for protein domain classification is profile HMM-based alignment. However, the relatively high rates of insertions and deletions in homopolymer regions of pyrosequencing reads create frameshifts, causing conventional profile HMM alignment tools to generate alignments with marginal scores. This makes error-containing gene fragments unclassifiable with conventional tools. Thus, there is a need for an accurate domain classification tool that can detect and correct sequencing errors.</p> <p>Results</p> <p>We introduce HMM-FRAME, a protein domain classification tool based on an augmented Viterbi algorithm that can incorporate error models from different sequencing platforms. HMM-FRAME corrects sequencing errors and classifies putative gene fragments into domain families. It achieved high error detection sensitivity and specificity in a data set with annotated errors. We applied HMM-FRAME in Targeted Metagenomics and a published metagenomic data set. The results showed that our tool can correct frameshifts in error-containing sequences, generate much longer alignments with significantly smaller E-values, and classify more sequences into their native families.</p> <p>Conclusions</p> <p>HMM-FRAME provides a complementary protein domain classification tool to conventional profile HMM-based methods for data sets containing frameshifts. Its current implementation is best used for small-scale metagenomic data sets. The source code of HMM-FRAME can be downloaded at <url>http://www.cse.msu.edu/~zhangy72/hmmframe/</url> and at <url>https://sourceforge.net/projects/hmm-frame/</url>.</p>
first_indexed 2024-04-12T19:55:21Z
format Article
id doaj.art-355c03f7c296442787201e0df2b58f18
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-12T19:55:21Z
publishDate 2011-05-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-355c03f7c296442787201e0df2b58f182022-12-22T03:18:40ZengBMCBMC Bioinformatics1471-21052011-05-0112119810.1186/1471-2105-12-198HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errorsSun YanniZhang Yuan<p>Abstract</p> <p>Background</p> <p>Protein domain classification is an important step in metagenomic annotation. The state-of-the-art method for protein domain classification is profile HMM-based alignment. However, the relatively high rates of insertions and deletions in homopolymer regions of pyrosequencing reads create frameshifts, causing conventional profile HMM alignment tools to generate alignments with marginal scores. This makes error-containing gene fragments unclassifiable with conventional tools. Thus, there is a need for an accurate domain classification tool that can detect and correct sequencing errors.</p> <p>Results</p> <p>We introduce HMM-FRAME, a protein domain classification tool based on an augmented Viterbi algorithm that can incorporate error models from different sequencing platforms. HMM-FRAME corrects sequencing errors and classifies putative gene fragments into domain families. It achieved high error detection sensitivity and specificity in a data set with annotated errors. We applied HMM-FRAME in Targeted Metagenomics and a published metagenomic data set. The results showed that our tool can correct frameshifts in error-containing sequences, generate much longer alignments with significantly smaller E-values, and classify more sequences into their native families.</p> <p>Conclusions</p> <p>HMM-FRAME provides a complementary protein domain classification tool to conventional profile HMM-based methods for data sets containing frameshifts. Its current implementation is best used for small-scale metagenomic data sets. The source code of HMM-FRAME can be downloaded at <url>http://www.cse.msu.edu/~zhangy72/hmmframe/</url> and at <url>https://sourceforge.net/projects/hmm-frame/</url>.</p>http://www.biomedcentral.com/1471-2105/12/198
spellingShingle Sun Yanni
Zhang Yuan
HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors
BMC Bioinformatics
title HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors
title_full HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors
title_fullStr HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors
title_full_unstemmed HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors
title_short HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors
title_sort hmm frame accurate protein domain classification for metagenomic sequences containing frameshift errors
url http://www.biomedcentral.com/1471-2105/12/198
work_keys_str_mv AT sunyanni hmmframeaccurateproteindomainclassificationformetagenomicsequencescontainingframeshifterrors
AT zhangyuan hmmframeaccurateproteindomainclassificationformetagenomicsequencescontainingframeshifterrors