A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins

Abstract Because of the limited effectiveness of prevailing phylogenetic methods when applied to highly divergent protein sequences, the phylogenetic analysis problem remains challenging. Here, we propose a sequence-based evolutionary distance algorithm termed sequence distance (SD), which innovativ...

Full description

Bibliographic Details
Main Authors: Wei Cao, Lu-Yun Wu, Xia-Yu Xia, Xiang Chen, Zhi-Xin Wang, Xian-Ming Pan
Format: Article
Language:English
Published: Nature Portfolio 2023-11-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-023-47496-9
_version_ 1797371244964741120
author Wei Cao
Lu-Yun Wu
Xia-Yu Xia
Xiang Chen
Zhi-Xin Wang
Xian-Ming Pan
author_facet Wei Cao
Lu-Yun Wu
Xia-Yu Xia
Xiang Chen
Zhi-Xin Wang
Xian-Ming Pan
author_sort Wei Cao
collection DOAJ
description Abstract Because of the limited effectiveness of prevailing phylogenetic methods when applied to highly divergent protein sequences, the phylogenetic analysis problem remains challenging. Here, we propose a sequence-based evolutionary distance algorithm termed sequence distance (SD), which innovatively incorporates site-to-site correlation within protein sequences into the distance estimation. In protein superfamilies, SD can effectively distinguish evolutionary relationships both within and between protein families, producing phylogenetic trees that closely align with those based on structural information, even with sequence identity less than 20%. SD is highly correlated with the similarity of the protein structure, and can calculate evolutionary distances for thousands of protein pairs within seconds using a single CPU, which is significantly faster than most protein structure prediction methods that demand high computational resources and long run times. The development of SD will significantly advance phylogenetics, providing researchers with a more accurate and reliable tool for exploring evolutionary relationships.
first_indexed 2024-03-08T18:16:06Z
format Article
id doaj.art-e3152a988ee14d8baef8f6f30990b300
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-03-08T18:16:06Z
publishDate 2023-11-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-e3152a988ee14d8baef8f6f30990b3002023-12-31T12:09:52ZengNature PortfolioScientific Reports2045-23222023-11-0113111210.1038/s41598-023-47496-9A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteinsWei Cao0Lu-Yun Wu1Xia-Yu Xia2Xiang Chen3Zhi-Xin Wang4Xian-Ming Pan5Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua UniversityKey Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua UniversityKey Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua UniversityKey Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua UniversityKey Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua UniversityKey Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua UniversityAbstract Because of the limited effectiveness of prevailing phylogenetic methods when applied to highly divergent protein sequences, the phylogenetic analysis problem remains challenging. Here, we propose a sequence-based evolutionary distance algorithm termed sequence distance (SD), which innovatively incorporates site-to-site correlation within protein sequences into the distance estimation. In protein superfamilies, SD can effectively distinguish evolutionary relationships both within and between protein families, producing phylogenetic trees that closely align with those based on structural information, even with sequence identity less than 20%. SD is highly correlated with the similarity of the protein structure, and can calculate evolutionary distances for thousands of protein pairs within seconds using a single CPU, which is significantly faster than most protein structure prediction methods that demand high computational resources and long run times. The development of SD will significantly advance phylogenetics, providing researchers with a more accurate and reliable tool for exploring evolutionary relationships.https://doi.org/10.1038/s41598-023-47496-9
spellingShingle Wei Cao
Lu-Yun Wu
Xia-Yu Xia
Xiang Chen
Zhi-Xin Wang
Xian-Ming Pan
A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins
Scientific Reports
title A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins
title_full A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins
title_fullStr A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins
title_full_unstemmed A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins
title_short A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins
title_sort sequence based evolutionary distance method for phylogenetic analysis of highly divergent proteins
url https://doi.org/10.1038/s41598-023-47496-9
work_keys_str_mv AT weicao asequencebasedevolutionarydistancemethodforphylogeneticanalysisofhighlydivergentproteins
AT luyunwu asequencebasedevolutionarydistancemethodforphylogeneticanalysisofhighlydivergentproteins
AT xiayuxia asequencebasedevolutionarydistancemethodforphylogeneticanalysisofhighlydivergentproteins
AT xiangchen asequencebasedevolutionarydistancemethodforphylogeneticanalysisofhighlydivergentproteins
AT zhixinwang asequencebasedevolutionarydistancemethodforphylogeneticanalysisofhighlydivergentproteins
AT xianmingpan asequencebasedevolutionarydistancemethodforphylogeneticanalysisofhighlydivergentproteins
AT weicao sequencebasedevolutionarydistancemethodforphylogeneticanalysisofhighlydivergentproteins
AT luyunwu sequencebasedevolutionarydistancemethodforphylogeneticanalysisofhighlydivergentproteins
AT xiayuxia sequencebasedevolutionarydistancemethodforphylogeneticanalysisofhighlydivergentproteins
AT xiangchen sequencebasedevolutionarydistancemethodforphylogeneticanalysisofhighlydivergentproteins
AT zhixinwang sequencebasedevolutionarydistancemethodforphylogeneticanalysisofhighlydivergentproteins
AT xianmingpan sequencebasedevolutionarydistancemethodforphylogeneticanalysisofhighlydivergentproteins