Predicting conserved protein motifs with Sub-HMMs

Abstract Background Profile HMMs (hidden Markov models) provide effective methods for modeling the conserved regions of protein families. A limitation of the resulting domain models is the difficulty to pinpoint their much shorter functional sub-feature...

Full description

Bibliographic Details
Main Authors:	Girke Thomas, Shelton Christian R, Horan Kevin
Format:	Article
Language:	English
Published:	BMC 2010-04-01
Series:	BMC Bioinformatics
Online Access:	http://www.biomedcentral.com/1471-2105/11/205

_version_	1818515616714194944
author	Girke Thomas Shelton Christian R Horan Kevin
author_facet	Girke Thomas Shelton Christian R Horan Kevin
author_sort	Girke Thomas
collection	DOAJ
description	<p>Abstract</p> <p>Background</p> <p>Profile HMMs (hidden Markov models) provide effective methods for modeling the conserved regions of protein families. A limitation of the resulting domain models is the difficulty to pinpoint their much shorter functional sub-features, such as catalytically relevant sequence motifs in enzymes or ligand binding signatures of receptor proteins.</p> <p>Results</p> <p>To identify these conserved motifs efficiently, we propose a method for extracting the most information-rich regions in protein families from their profile HMMs. The method was used here to predict a comprehensive set of sub-HMMs from the Pfam domain database. Cross-validations with the PROSITE and CSA databases confirmed the efficiency of the method in predicting most of the known functionally relevant motifs and residues. At the same time, 46,768 novel conserved regions could be predicted. The data set also allowed us to link at least 461 Pfam domains of known and unknown function by their common sub-HMMs. Finally, the sub-HMM method showed very promising results as an alternative search method for identifying proteins that share only short sequence similarities.</p> <p>Conclusions</p> <p>Sub-HMMs extend the application spectrum of profile HMMs to motif discovery. Their most interesting utility is the identification of the functionally relevant residues in proteins of known and unknown function. Additionally, sub-HMMs can be used for highly localized sequence similarity searches that focus on shorter conserved features rather than entire domains or global similarities. The motif data generated by this study is a valuable knowledge resource for characterizing protein functions in the future.</p>
first_indexed	2024-12-11T00:31:07Z
format	Article
id	doaj.art-1a8db67f6c654828bc141e3607ac744e
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-12-11T00:31:07Z
publishDate	2010-04-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-1a8db67f6c654828bc141e3607ac744e2022-12-22T01:27:20ZengBMCBMC Bioinformatics1471-21052010-04-0111120510.1186/1471-2105-11-205Predicting conserved protein motifs with Sub-HMMsGirke ThomasShelton Christian RHoran Kevin<p>Abstract</p> <p>Background</p> <p>Profile HMMs (hidden Markov models) provide effective methods for modeling the conserved regions of protein families. A limitation of the resulting domain models is the difficulty to pinpoint their much shorter functional sub-features, such as catalytically relevant sequence motifs in enzymes or ligand binding signatures of receptor proteins.</p> <p>Results</p> <p>To identify these conserved motifs efficiently, we propose a method for extracting the most information-rich regions in protein families from their profile HMMs. The method was used here to predict a comprehensive set of sub-HMMs from the Pfam domain database. Cross-validations with the PROSITE and CSA databases confirmed the efficiency of the method in predicting most of the known functionally relevant motifs and residues. At the same time, 46,768 novel conserved regions could be predicted. The data set also allowed us to link at least 461 Pfam domains of known and unknown function by their common sub-HMMs. Finally, the sub-HMM method showed very promising results as an alternative search method for identifying proteins that share only short sequence similarities.</p> <p>Conclusions</p> <p>Sub-HMMs extend the application spectrum of profile HMMs to motif discovery. Their most interesting utility is the identification of the functionally relevant residues in proteins of known and unknown function. Additionally, sub-HMMs can be used for highly localized sequence similarity searches that focus on shorter conserved features rather than entire domains or global similarities. The motif data generated by this study is a valuable knowledge resource for characterizing protein functions in the future.</p>http://www.biomedcentral.com/1471-2105/11/205
spellingShingle	Girke Thomas Shelton Christian R Horan Kevin Predicting conserved protein motifs with Sub-HMMs BMC Bioinformatics
title	Predicting conserved protein motifs with Sub-HMMs
title_full	Predicting conserved protein motifs with Sub-HMMs
title_fullStr	Predicting conserved protein motifs with Sub-HMMs
title_full_unstemmed	Predicting conserved protein motifs with Sub-HMMs
title_short	Predicting conserved protein motifs with Sub-HMMs
title_sort	predicting conserved protein motifs with sub hmms
url	http://www.biomedcentral.com/1471-2105/11/205
work_keys_str_mv	AT girkethomas predictingconservedproteinmotifswithsubhmms AT sheltonchristianr predictingconservedproteinmotifswithsubhmms AT horankevin predictingconservedproteinmotifswithsubhmms

Predicting conserved protein motifs with Sub-HMMs

Similar Items