SeqCP: A sequence-based algorithm for searching circularly permuted proteins

Circular permutation (CP) is a protein sequence rearrangement in which the amino- and carboxyl-termini of a protein can be created in different positions along the imaginary circularized sequence. Circularly permutated proteins usually exhibit conserved three-dimensional structures and functions. By...

Full description

Bibliographic Details
Main Authors: Chi-Chun Chen, Yu-Wei Huang, Hsuan-Cheng Huang, Wei-Cheng Lo, Ping-Chiang Lyu
Format: Article
Language:English
Published: Elsevier 2023-01-01
Series:Computational and Structural Biotechnology Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2001037022005189
_version_ 1797384168629338112
author Chi-Chun Chen
Yu-Wei Huang
Hsuan-Cheng Huang
Wei-Cheng Lo
Ping-Chiang Lyu
author_facet Chi-Chun Chen
Yu-Wei Huang
Hsuan-Cheng Huang
Wei-Cheng Lo
Ping-Chiang Lyu
author_sort Chi-Chun Chen
collection DOAJ
description Circular permutation (CP) is a protein sequence rearrangement in which the amino- and carboxyl-termini of a protein can be created in different positions along the imaginary circularized sequence. Circularly permutated proteins usually exhibit conserved three-dimensional structures and functions. By comparing the structures of circular permutants (CPMs), protein research and bioengineering applications can be approached in ways that are difficult to achieve by traditional mutagenesis. Most current CP detection algorithms depend on structural information. Because there is a vast number of proteins with unknown structures, many CP pairs may remain unidentified. An efficient sequence-based CP detector will help identify more CP pairs and advance many protein studies. For instance, some hypothetical proteins may have CPMs with known functions and structures that are informative for functional annotation, but existing structure-based CP search methods cannot be applied when those hypothetical proteins lack structural information. Despite the considerable potential for applications, sequence-based CP search methods have not been well developed. We present a sequence-based method, SeqCP, which analyzes normal and duplicated sequence alignments to identify CPMs and determine candidate CP sites for proteins. SeqCP was trained by data obtained from the Circular Permutation Database and tested with nonredundant datasets from the Protein Data Bank. It shows high reliability in CP identification and achieves an AUC of 0.9. SeqCP has been implemented into a web server available at: http://pcnas.life.nthu.edu.tw/SeqCP/.
first_indexed 2024-03-08T21:31:35Z
format Article
id doaj.art-c0ca2f20e5fe484a895ae68dc4d90a71
institution Directory Open Access Journal
issn 2001-0370
language English
last_indexed 2024-03-08T21:31:35Z
publishDate 2023-01-01
publisher Elsevier
record_format Article
series Computational and Structural Biotechnology Journal
spelling doaj.art-c0ca2f20e5fe484a895ae68dc4d90a712023-12-21T07:30:13ZengElsevierComputational and Structural Biotechnology Journal2001-03702023-01-0121185201SeqCP: A sequence-based algorithm for searching circularly permuted proteinsChi-Chun Chen0Yu-Wei Huang1Hsuan-Cheng Huang2Wei-Cheng Lo3Ping-Chiang Lyu4Bioinformatics Program, Institute of Information Science, Taiwan International Graduate Program, Academia Sinica, Taipei 115, Taiwan; Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Hsinchu 300, TaiwanInstitute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, TaiwanBioinformatics Program, Institute of Information Science, Taiwan International Graduate Program, Academia Sinica, Taipei 115, Taiwan; Institute of Biomedical Informatics, National Yang Ming Chiao Tung University, Taipei 112, TaiwanInstitute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; The Center for Bioinformatics Research, National Yang Ming Chiao Tung University, Hsinchu, Taiwan; Corresponding authors at: Life Science Building II, Room 306, No. 101, Section 2, Kuang Fu Road, Hsinchu 300044, Taiwan, ROC.Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Hsinchu 300, Taiwan; Corresponding authors at: Life Science Building II, Room 306, No. 101, Section 2, Kuang Fu Road, Hsinchu 300044, Taiwan, ROC.Circular permutation (CP) is a protein sequence rearrangement in which the amino- and carboxyl-termini of a protein can be created in different positions along the imaginary circularized sequence. Circularly permutated proteins usually exhibit conserved three-dimensional structures and functions. By comparing the structures of circular permutants (CPMs), protein research and bioengineering applications can be approached in ways that are difficult to achieve by traditional mutagenesis. Most current CP detection algorithms depend on structural information. Because there is a vast number of proteins with unknown structures, many CP pairs may remain unidentified. An efficient sequence-based CP detector will help identify more CP pairs and advance many protein studies. For instance, some hypothetical proteins may have CPMs with known functions and structures that are informative for functional annotation, but existing structure-based CP search methods cannot be applied when those hypothetical proteins lack structural information. Despite the considerable potential for applications, sequence-based CP search methods have not been well developed. We present a sequence-based method, SeqCP, which analyzes normal and duplicated sequence alignments to identify CPMs and determine candidate CP sites for proteins. SeqCP was trained by data obtained from the Circular Permutation Database and tested with nonredundant datasets from the Protein Data Bank. It shows high reliability in CP identification and achieves an AUC of 0.9. SeqCP has been implemented into a web server available at: http://pcnas.life.nthu.edu.tw/SeqCP/.http://www.sciencedirect.com/science/article/pii/S2001037022005189Circular permutationCircular permutantsProtein sequence analysisProtein structure modeling
spellingShingle Chi-Chun Chen
Yu-Wei Huang
Hsuan-Cheng Huang
Wei-Cheng Lo
Ping-Chiang Lyu
SeqCP: A sequence-based algorithm for searching circularly permuted proteins
Computational and Structural Biotechnology Journal
Circular permutation
Circular permutants
Protein sequence analysis
Protein structure modeling
title SeqCP: A sequence-based algorithm for searching circularly permuted proteins
title_full SeqCP: A sequence-based algorithm for searching circularly permuted proteins
title_fullStr SeqCP: A sequence-based algorithm for searching circularly permuted proteins
title_full_unstemmed SeqCP: A sequence-based algorithm for searching circularly permuted proteins
title_short SeqCP: A sequence-based algorithm for searching circularly permuted proteins
title_sort seqcp a sequence based algorithm for searching circularly permuted proteins
topic Circular permutation
Circular permutants
Protein sequence analysis
Protein structure modeling
url http://www.sciencedirect.com/science/article/pii/S2001037022005189
work_keys_str_mv AT chichunchen seqcpasequencebasedalgorithmforsearchingcircularlypermutedproteins
AT yuweihuang seqcpasequencebasedalgorithmforsearchingcircularlypermutedproteins
AT hsuanchenghuang seqcpasequencebasedalgorithmforsearchingcircularlypermutedproteins
AT weichenglo seqcpasequencebasedalgorithmforsearchingcircularlypermutedproteins
AT pingchianglyu seqcpasequencebasedalgorithmforsearchingcircularlypermutedproteins