ECDEP: identifying essential proteins based on evolutionary community discovery and subcellular localization

Abstract Background In cellular activities, essential proteins play a vital role and are instrumental in comprehending fundamental biological necessities and identifying pathogenic genes. Current deep learning approaches for predicting essential proteins underutilize the potential of gene expression...

Full description

Bibliographic Details
Main Authors: Chen Ye, Qi Wu, Shuxia Chen, Xuemei Zhang, Wenwen Xu, Yunzhi Wu, Youhua Zhang, Yi Yue
Format: Article
Language:English
Published: BMC 2024-01-01
Series:BMC Genomics
Subjects:
Online Access:https://doi.org/10.1186/s12864-024-10019-5
_version_ 1827370020425957376
author Chen Ye
Qi Wu
Shuxia Chen
Xuemei Zhang
Wenwen Xu
Yunzhi Wu
Youhua Zhang
Yi Yue
author_facet Chen Ye
Qi Wu
Shuxia Chen
Xuemei Zhang
Wenwen Xu
Yunzhi Wu
Youhua Zhang
Yi Yue
author_sort Chen Ye
collection DOAJ
description Abstract Background In cellular activities, essential proteins play a vital role and are instrumental in comprehending fundamental biological necessities and identifying pathogenic genes. Current deep learning approaches for predicting essential proteins underutilize the potential of gene expression data and are inadequate for the exploration of dynamic networks with limited evaluation across diverse species. Results We introduce ECDEP, an essential protein identification model based on evolutionary community discovery. ECDEP integrates temporal gene expression data with a protein–protein interaction (PPI) network and employs the 3-Sigma rule to eliminate outliers at each time point, constructing a dynamic network. Next, we utilize edge birth and death information to establish an interaction streaming source to feed into the evolutionary community discovery algorithm and then identify overlapping communities during the evolution of the dynamic network. SVM recursive feature elimination (RFE) is applied to extract the most informative communities, which are combined with subcellular localization data for classification predictions. We assess the performance of ECDEP by comparing it against ten centrality methods, four shallow machine learning methods with RFE, and two deep learning methods that incorporate multiple biological data sources on Saccharomyces. Cerevisiae (S. cerevisiae), Homo sapiens (H. sapiens), Mus musculus, and Caenorhabditis elegans. ECDEP achieves an AP value of 0.86 on the H. sapiens dataset and the contribution ratio of community features in classification reaches 0.54 on the S. cerevisiae (Krogan) dataset. Conclusions Our proposed method adeptly integrates network dynamics and yields outstanding results across various datasets. Furthermore, the incorporation of evolutionary community discovery algorithms amplifies the capacity of gene expression data in classification.
first_indexed 2024-03-08T10:01:13Z
format Article
id doaj.art-d83859d6356b4607a6736b67fbcc3a72
institution Directory Open Access Journal
issn 1471-2164
language English
last_indexed 2024-03-08T10:01:13Z
publishDate 2024-01-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj.art-d83859d6356b4607a6736b67fbcc3a722024-01-29T10:59:48ZengBMCBMC Genomics1471-21642024-01-0125112310.1186/s12864-024-10019-5ECDEP: identifying essential proteins based on evolutionary community discovery and subcellular localizationChen Ye0Qi Wu1Shuxia Chen2Xuemei Zhang3Wenwen Xu4Yunzhi Wu5Youhua Zhang6Yi Yue7School of Information and Artificial Intelligence, Anhui Agricultural UniversitySchool of Information and Artificial Intelligence, Anhui Agricultural UniversitySchool of Information and Artificial Intelligence, Anhui Agricultural UniversitySchool of Information and Artificial Intelligence, Anhui Agricultural UniversitySchool of Information and Artificial Intelligence, Anhui Agricultural UniversitySchool of Information and Artificial Intelligence, Anhui Agricultural UniversitySchool of Information and Artificial Intelligence, Anhui Agricultural UniversitySchool of Information and Artificial Intelligence, Anhui Agricultural UniversityAbstract Background In cellular activities, essential proteins play a vital role and are instrumental in comprehending fundamental biological necessities and identifying pathogenic genes. Current deep learning approaches for predicting essential proteins underutilize the potential of gene expression data and are inadequate for the exploration of dynamic networks with limited evaluation across diverse species. Results We introduce ECDEP, an essential protein identification model based on evolutionary community discovery. ECDEP integrates temporal gene expression data with a protein–protein interaction (PPI) network and employs the 3-Sigma rule to eliminate outliers at each time point, constructing a dynamic network. Next, we utilize edge birth and death information to establish an interaction streaming source to feed into the evolutionary community discovery algorithm and then identify overlapping communities during the evolution of the dynamic network. SVM recursive feature elimination (RFE) is applied to extract the most informative communities, which are combined with subcellular localization data for classification predictions. We assess the performance of ECDEP by comparing it against ten centrality methods, four shallow machine learning methods with RFE, and two deep learning methods that incorporate multiple biological data sources on Saccharomyces. Cerevisiae (S. cerevisiae), Homo sapiens (H. sapiens), Mus musculus, and Caenorhabditis elegans. ECDEP achieves an AP value of 0.86 on the H. sapiens dataset and the contribution ratio of community features in classification reaches 0.54 on the S. cerevisiae (Krogan) dataset. Conclusions Our proposed method adeptly integrates network dynamics and yields outstanding results across various datasets. Furthermore, the incorporation of evolutionary community discovery algorithms amplifies the capacity of gene expression data in classification.https://doi.org/10.1186/s12864-024-10019-5Essential proteinEvolutionary community discoveryProtein–protein interaction networkSubcellular localizationGene expression
spellingShingle Chen Ye
Qi Wu
Shuxia Chen
Xuemei Zhang
Wenwen Xu
Yunzhi Wu
Youhua Zhang
Yi Yue
ECDEP: identifying essential proteins based on evolutionary community discovery and subcellular localization
BMC Genomics
Essential protein
Evolutionary community discovery
Protein–protein interaction network
Subcellular localization
Gene expression
title ECDEP: identifying essential proteins based on evolutionary community discovery and subcellular localization
title_full ECDEP: identifying essential proteins based on evolutionary community discovery and subcellular localization
title_fullStr ECDEP: identifying essential proteins based on evolutionary community discovery and subcellular localization
title_full_unstemmed ECDEP: identifying essential proteins based on evolutionary community discovery and subcellular localization
title_short ECDEP: identifying essential proteins based on evolutionary community discovery and subcellular localization
title_sort ecdep identifying essential proteins based on evolutionary community discovery and subcellular localization
topic Essential protein
Evolutionary community discovery
Protein–protein interaction network
Subcellular localization
Gene expression
url https://doi.org/10.1186/s12864-024-10019-5
work_keys_str_mv AT chenye ecdepidentifyingessentialproteinsbasedonevolutionarycommunitydiscoveryandsubcellularlocalization
AT qiwu ecdepidentifyingessentialproteinsbasedonevolutionarycommunitydiscoveryandsubcellularlocalization
AT shuxiachen ecdepidentifyingessentialproteinsbasedonevolutionarycommunitydiscoveryandsubcellularlocalization
AT xuemeizhang ecdepidentifyingessentialproteinsbasedonevolutionarycommunitydiscoveryandsubcellularlocalization
AT wenwenxu ecdepidentifyingessentialproteinsbasedonevolutionarycommunitydiscoveryandsubcellularlocalization
AT yunzhiwu ecdepidentifyingessentialproteinsbasedonevolutionarycommunitydiscoveryandsubcellularlocalization
AT youhuazhang ecdepidentifyingessentialproteinsbasedonevolutionarycommunitydiscoveryandsubcellularlocalization
AT yiyue ecdepidentifyingessentialproteinsbasedonevolutionarycommunitydiscoveryandsubcellularlocalization