A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins
Abstract Because of the limited effectiveness of prevailing phylogenetic methods when applied to highly divergent protein sequences, the phylogenetic analysis problem remains challenging. Here, we propose a sequence-based evolutionary distance algorithm termed sequence distance (SD), which innovativ...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2023-11-01
|
Series: | Scientific Reports |
Online Access: | https://doi.org/10.1038/s41598-023-47496-9 |
_version_ | 1797371244964741120 |
---|---|
author | Wei Cao Lu-Yun Wu Xia-Yu Xia Xiang Chen Zhi-Xin Wang Xian-Ming Pan |
author_facet | Wei Cao Lu-Yun Wu Xia-Yu Xia Xiang Chen Zhi-Xin Wang Xian-Ming Pan |
author_sort | Wei Cao |
collection | DOAJ |
description | Abstract Because of the limited effectiveness of prevailing phylogenetic methods when applied to highly divergent protein sequences, the phylogenetic analysis problem remains challenging. Here, we propose a sequence-based evolutionary distance algorithm termed sequence distance (SD), which innovatively incorporates site-to-site correlation within protein sequences into the distance estimation. In protein superfamilies, SD can effectively distinguish evolutionary relationships both within and between protein families, producing phylogenetic trees that closely align with those based on structural information, even with sequence identity less than 20%. SD is highly correlated with the similarity of the protein structure, and can calculate evolutionary distances for thousands of protein pairs within seconds using a single CPU, which is significantly faster than most protein structure prediction methods that demand high computational resources and long run times. The development of SD will significantly advance phylogenetics, providing researchers with a more accurate and reliable tool for exploring evolutionary relationships. |
first_indexed | 2024-03-08T18:16:06Z |
format | Article |
id | doaj.art-e3152a988ee14d8baef8f6f30990b300 |
institution | Directory Open Access Journal |
issn | 2045-2322 |
language | English |
last_indexed | 2024-03-08T18:16:06Z |
publishDate | 2023-11-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj.art-e3152a988ee14d8baef8f6f30990b3002023-12-31T12:09:52ZengNature PortfolioScientific Reports2045-23222023-11-0113111210.1038/s41598-023-47496-9A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteinsWei Cao0Lu-Yun Wu1Xia-Yu Xia2Xiang Chen3Zhi-Xin Wang4Xian-Ming Pan5Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua UniversityKey Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua UniversityKey Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua UniversityKey Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua UniversityKey Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua UniversityKey Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua UniversityAbstract Because of the limited effectiveness of prevailing phylogenetic methods when applied to highly divergent protein sequences, the phylogenetic analysis problem remains challenging. Here, we propose a sequence-based evolutionary distance algorithm termed sequence distance (SD), which innovatively incorporates site-to-site correlation within protein sequences into the distance estimation. In protein superfamilies, SD can effectively distinguish evolutionary relationships both within and between protein families, producing phylogenetic trees that closely align with those based on structural information, even with sequence identity less than 20%. SD is highly correlated with the similarity of the protein structure, and can calculate evolutionary distances for thousands of protein pairs within seconds using a single CPU, which is significantly faster than most protein structure prediction methods that demand high computational resources and long run times. The development of SD will significantly advance phylogenetics, providing researchers with a more accurate and reliable tool for exploring evolutionary relationships.https://doi.org/10.1038/s41598-023-47496-9 |
spellingShingle | Wei Cao Lu-Yun Wu Xia-Yu Xia Xiang Chen Zhi-Xin Wang Xian-Ming Pan A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins Scientific Reports |
title | A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins |
title_full | A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins |
title_fullStr | A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins |
title_full_unstemmed | A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins |
title_short | A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins |
title_sort | sequence based evolutionary distance method for phylogenetic analysis of highly divergent proteins |
url | https://doi.org/10.1038/s41598-023-47496-9 |
work_keys_str_mv | AT weicao asequencebasedevolutionarydistancemethodforphylogeneticanalysisofhighlydivergentproteins AT luyunwu asequencebasedevolutionarydistancemethodforphylogeneticanalysisofhighlydivergentproteins AT xiayuxia asequencebasedevolutionarydistancemethodforphylogeneticanalysisofhighlydivergentproteins AT xiangchen asequencebasedevolutionarydistancemethodforphylogeneticanalysisofhighlydivergentproteins AT zhixinwang asequencebasedevolutionarydistancemethodforphylogeneticanalysisofhighlydivergentproteins AT xianmingpan asequencebasedevolutionarydistancemethodforphylogeneticanalysisofhighlydivergentproteins AT weicao sequencebasedevolutionarydistancemethodforphylogeneticanalysisofhighlydivergentproteins AT luyunwu sequencebasedevolutionarydistancemethodforphylogeneticanalysisofhighlydivergentproteins AT xiayuxia sequencebasedevolutionarydistancemethodforphylogeneticanalysisofhighlydivergentproteins AT xiangchen sequencebasedevolutionarydistancemethodforphylogeneticanalysisofhighlydivergentproteins AT zhixinwang sequencebasedevolutionarydistancemethodforphylogeneticanalysisofhighlydivergentproteins AT xianmingpan sequencebasedevolutionarydistancemethodforphylogeneticanalysisofhighlydivergentproteins |