Neglog: Homology-Based Negative Data Sampling Method for Genome-Scale Reconstruction of Human Protein–Protein Interaction Networks
Rapid reconstruction of genome-scale protein–protein interaction (PPI) networks is instrumental in understanding the cellular processes and disease pathogenesis and drug reactions. However, lack of experimentally verified negative data (i.e., pairs of proteins that do not interact) is still a major...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2019-10-01
|
Series: | International Journal of Molecular Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/1422-0067/20/20/5075 |
_version_ | 1811296860650864640 |
---|---|
author | Suyu Mei Kun Zhang |
author_facet | Suyu Mei Kun Zhang |
author_sort | Suyu Mei |
collection | DOAJ |
description | Rapid reconstruction of genome-scale protein–protein interaction (PPI) networks is instrumental in understanding the cellular processes and disease pathogenesis and drug reactions. However, lack of experimentally verified negative data (i.e., pairs of proteins that do not interact) is still a major issue that needs to be properly addressed in computational modeling. In this study, we take advantage of the very limited experimentally verified negative data from Negatome to infer more negative data for computational modeling. We assume that the paralogs or orthologs of two non-interacting proteins also do not interact with high probability. We coin an assumption as “Neglog” this assumption is to some extent supported by paralogous/orthologous structure conservation. To reduce the risk of bias toward the negative data from Negatome, we combine Neglog with less biased random sampling according to a certain ratio to construct training data. L<sub>2</sub>-regularized logistic regression is used as the base classifier to counteract noise and train on a large dataset. Computational results show that the proposed Neglog method outperforms pure random sampling method with sound biological interpretability. In addition, we find that independent test on negative data is indispensable for bias control, which is usually neglected by existing studies. Lastly, we use the Neglog method to validate the PPIs in STRING, which are supported by gene ontology (GO) enrichment analyses. |
first_indexed | 2024-04-13T05:54:46Z |
format | Article |
id | doaj.art-3280c49ec1374afbabb77b87f8f67165 |
institution | Directory Open Access Journal |
issn | 1422-0067 |
language | English |
last_indexed | 2024-04-13T05:54:46Z |
publishDate | 2019-10-01 |
publisher | MDPI AG |
record_format | Article |
series | International Journal of Molecular Sciences |
spelling | doaj.art-3280c49ec1374afbabb77b87f8f671652022-12-22T02:59:39ZengMDPI AGInternational Journal of Molecular Sciences1422-00672019-10-012020507510.3390/ijms20205075ijms20205075Neglog: Homology-Based Negative Data Sampling Method for Genome-Scale Reconstruction of Human Protein–Protein Interaction NetworksSuyu Mei0Kun Zhang1Software College, Shenyang Normal University, Shenyang 110034, ChinaBioinformatics Facility of Xavier NIH RCMI Cancer Research Center, Department of Computer Science, Xavier University of Louisiana, New Orleans, LA 70125, USARapid reconstruction of genome-scale protein–protein interaction (PPI) networks is instrumental in understanding the cellular processes and disease pathogenesis and drug reactions. However, lack of experimentally verified negative data (i.e., pairs of proteins that do not interact) is still a major issue that needs to be properly addressed in computational modeling. In this study, we take advantage of the very limited experimentally verified negative data from Negatome to infer more negative data for computational modeling. We assume that the paralogs or orthologs of two non-interacting proteins also do not interact with high probability. We coin an assumption as “Neglog” this assumption is to some extent supported by paralogous/orthologous structure conservation. To reduce the risk of bias toward the negative data from Negatome, we combine Neglog with less biased random sampling according to a certain ratio to construct training data. L<sub>2</sub>-regularized logistic regression is used as the base classifier to counteract noise and train on a large dataset. Computational results show that the proposed Neglog method outperforms pure random sampling method with sound biological interpretability. In addition, we find that independent test on negative data is indispensable for bias control, which is usually neglected by existing studies. Lastly, we use the Neglog method to validate the PPIs in STRING, which are supported by gene ontology (GO) enrichment analyses.https://www.mdpi.com/1422-0067/20/20/5075protein–protein interactionparalog/ortholognegative data samplingmachine learningl<sub>2</sub>-regularized logistic regression |
spellingShingle | Suyu Mei Kun Zhang Neglog: Homology-Based Negative Data Sampling Method for Genome-Scale Reconstruction of Human Protein–Protein Interaction Networks International Journal of Molecular Sciences protein–protein interaction paralog/ortholog negative data sampling machine learning l<sub>2</sub>-regularized logistic regression |
title | Neglog: Homology-Based Negative Data Sampling Method for Genome-Scale Reconstruction of Human Protein–Protein Interaction Networks |
title_full | Neglog: Homology-Based Negative Data Sampling Method for Genome-Scale Reconstruction of Human Protein–Protein Interaction Networks |
title_fullStr | Neglog: Homology-Based Negative Data Sampling Method for Genome-Scale Reconstruction of Human Protein–Protein Interaction Networks |
title_full_unstemmed | Neglog: Homology-Based Negative Data Sampling Method for Genome-Scale Reconstruction of Human Protein–Protein Interaction Networks |
title_short | Neglog: Homology-Based Negative Data Sampling Method for Genome-Scale Reconstruction of Human Protein–Protein Interaction Networks |
title_sort | neglog homology based negative data sampling method for genome scale reconstruction of human protein protein interaction networks |
topic | protein–protein interaction paralog/ortholog negative data sampling machine learning l<sub>2</sub>-regularized logistic regression |
url | https://www.mdpi.com/1422-0067/20/20/5075 |
work_keys_str_mv | AT suyumei negloghomologybasednegativedatasamplingmethodforgenomescalereconstructionofhumanproteinproteininteractionnetworks AT kunzhang negloghomologybasednegativedatasamplingmethodforgenomescalereconstructionofhumanproteinproteininteractionnetworks |