An improved classification of G-protein-coupled receptors using sequence-derived features

Background: G-protein-coupled receptors (GPCRs) play a key role in diverse physiological processes and are the targets of almost two-thirds of the marketed drugs. The 3 D structures of GPCRs are largely unavailable; however, a large number of GPCR primary sequences are known. To facilitate the ide...

Full description

Bibliographic Details
Main Authors:	Peng, Zhen-Ling, Yang, Jian-Yi, Chen, Xin
Other Authors:	School of Physical and Mathematical Sciences
Format:	Journal Article
Language:	English
Published:	2013
Subjects:	Mathematical Sciences
Online Access:	https://hdl.handle.net/10356/100530 http://hdl.handle.net/10220/17875

_version_	1811694288385343488
author	Peng, Zhen-Ling Yang, Jian-Yi Chen, Xin
author2	School of Physical and Mathematical Sciences
author_facet	School of Physical and Mathematical Sciences Peng, Zhen-Ling Yang, Jian-Yi Chen, Xin
author_sort	Peng, Zhen-Ling
collection	NTU
description	Background: G-protein-coupled receptors (GPCRs) play a key role in diverse physiological processes and are the targets of almost two-thirds of the marketed drugs. The 3 D structures of GPCRs are largely unavailable; however, a large number of GPCR primary sequences are known. To facilitate the identification and characterization of novel receptors, it is therefore very valuable to develop a computational method to accurately predict GPCRs from the protein primary sequences. Results: We propose a new method called PCA-GPCR, to predict GPCRs using a comprehensive set of 1497 sequence-derived features. The principal component analysis is first employed to reduce the dimension of the feature space to 32. Then, the resulting 32-dimensional feature vectors are fed into a simple yet powerful classification algorithm, called intimate sorting, to predict GPCRs at five levels. The prediction at the first level determines whether a protein is a GPCR or a non-GPCR. If it is predicted to be a GPCR, then it will be further predicted into certain family, subfamily, sub-subfamily and subtype by the classifiers at the second, third, fourth, and fifth levels, respectively. To train the classifiers applied at five levels, a non-redundant dataset is carefully constructed, which contains 3178, 1589, 4772, 4924, and 2741 protein sequences at the respective levels. Jackknife tests on this training dataset show that the overall accuracies of PCA-GPCR at five levels (from the first to the fifth) can achieve up to 99.5%, 88.8%, 80.47%, 80.3%, and 92.34%, respectively. We further perform predictions on a dataset of 1238 GPCRs at the second level, and on another two datasets of 167 and 566 GPCRs respectively at the fourth level. The overall prediction accuracies of our method are consistently higher than those of the existing methods to be compared. Conclusions: The comprehensive set of 1497 features is believed to be capable of capturing information about amino acid composition, sequence order as well as various physicochemical properties of proteins. Therefore, high accuracies are achieved when predicting GPCRs at all the five levels with our proposed method.
first_indexed	2024-10-01T07:05:11Z
format	Journal Article
id	ntu-10356/100530
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T07:05:11Z
publishDate	2013
record_format	dspace
spelling	ntu-10356/1005302023-02-28T19:37:11Z An improved classification of G-protein-coupled receptors using sequence-derived features Peng, Zhen-Ling Yang, Jian-Yi Chen, Xin School of Physical and Mathematical Sciences Mathematical Sciences Background: G-protein-coupled receptors (GPCRs) play a key role in diverse physiological processes and are the targets of almost two-thirds of the marketed drugs. The 3 D structures of GPCRs are largely unavailable; however, a large number of GPCR primary sequences are known. To facilitate the identification and characterization of novel receptors, it is therefore very valuable to develop a computational method to accurately predict GPCRs from the protein primary sequences. Results: We propose a new method called PCA-GPCR, to predict GPCRs using a comprehensive set of 1497 sequence-derived features. The principal component analysis is first employed to reduce the dimension of the feature space to 32. Then, the resulting 32-dimensional feature vectors are fed into a simple yet powerful classification algorithm, called intimate sorting, to predict GPCRs at five levels. The prediction at the first level determines whether a protein is a GPCR or a non-GPCR. If it is predicted to be a GPCR, then it will be further predicted into certain family, subfamily, sub-subfamily and subtype by the classifiers at the second, third, fourth, and fifth levels, respectively. To train the classifiers applied at five levels, a non-redundant dataset is carefully constructed, which contains 3178, 1589, 4772, 4924, and 2741 protein sequences at the respective levels. Jackknife tests on this training dataset show that the overall accuracies of PCA-GPCR at five levels (from the first to the fifth) can achieve up to 99.5%, 88.8%, 80.47%, 80.3%, and 92.34%, respectively. We further perform predictions on a dataset of 1238 GPCRs at the second level, and on another two datasets of 167 and 566 GPCRs respectively at the fourth level. The overall prediction accuracies of our method are consistently higher than those of the existing methods to be compared. Conclusions: The comprehensive set of 1497 features is believed to be capable of capturing information about amino acid composition, sequence order as well as various physicochemical properties of proteins. Therefore, high accuracies are achieved when predicting GPCRs at all the five levels with our proposed method. Published version 2013-11-27T05:57:54Z 2019-12-06T20:24:05Z 2013-11-27T05:57:54Z 2019-12-06T20:24:05Z 2010 2010 Journal Article Peng, Z. L., Yang, J. Y., & Chen, X. (2010). An improved classification of G-protein-coupled receptors using sequence-derived features. BMC Bioinformatics, 11(1), 420. 1471-2105 https://hdl.handle.net/10356/100530 http://hdl.handle.net/10220/17875 10.1186/1471-2105-11-420 20696050 en BMC bioinformatics © 2010 Peng et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License(http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. application/pdf
spellingShingle	Mathematical Sciences Peng, Zhen-Ling Yang, Jian-Yi Chen, Xin An improved classification of G-protein-coupled receptors using sequence-derived features
title	An improved classification of G-protein-coupled receptors using sequence-derived features
title_full	An improved classification of G-protein-coupled receptors using sequence-derived features
title_fullStr	An improved classification of G-protein-coupled receptors using sequence-derived features
title_full_unstemmed	An improved classification of G-protein-coupled receptors using sequence-derived features
title_short	An improved classification of G-protein-coupled receptors using sequence-derived features
title_sort	improved classification of g protein coupled receptors using sequence derived features
topic	Mathematical Sciences
url	https://hdl.handle.net/10356/100530 http://hdl.handle.net/10220/17875
work_keys_str_mv	AT pengzhenling animprovedclassificationofgproteincoupledreceptorsusingsequencederivedfeatures AT yangjianyi animprovedclassificationofgproteincoupledreceptorsusingsequencederivedfeatures AT chenxin animprovedclassificationofgproteincoupledreceptorsusingsequencederivedfeatures AT pengzhenling improvedclassificationofgproteincoupledreceptorsusingsequencederivedfeatures AT yangjianyi improvedclassificationofgproteincoupledreceptorsusingsequencederivedfeatures AT chenxin improvedclassificationofgproteincoupledreceptorsusingsequencederivedfeatures

An improved classification of G-protein-coupled receptors using sequence-derived features

Similar Items