Neglog: Homology-Based Negative Data Sampling Method for Genome-Scale Reconstruction of Human Protein–Protein Interaction Networks

Rapid reconstruction of genome-scale protein–protein interaction (PPI) networks is instrumental in understanding the cellular processes and disease pathogenesis and drug reactions. However, lack of experimentally verified negative data (i.e., pairs of proteins that do not interact) is still a major...

Full description

Bibliographic Details
Main Authors: Suyu Mei, Kun Zhang
Format: Article
Language:English
Published: MDPI AG 2019-10-01
Series:International Journal of Molecular Sciences
Subjects:
Online Access:https://www.mdpi.com/1422-0067/20/20/5075
_version_ 1811296860650864640
author Suyu Mei
Kun Zhang
author_facet Suyu Mei
Kun Zhang
author_sort Suyu Mei
collection DOAJ
description Rapid reconstruction of genome-scale protein–protein interaction (PPI) networks is instrumental in understanding the cellular processes and disease pathogenesis and drug reactions. However, lack of experimentally verified negative data (i.e., pairs of proteins that do not interact) is still a major issue that needs to be properly addressed in computational modeling. In this study, we take advantage of the very limited experimentally verified negative data from Negatome to infer more negative data for computational modeling. We assume that the paralogs or orthologs of two non-interacting proteins also do not interact with high probability. We coin an assumption as “Neglog” this assumption is to some extent supported by paralogous/orthologous structure conservation. To reduce the risk of bias toward the negative data from Negatome, we combine Neglog with less biased random sampling according to a certain ratio to construct training data. L<sub>2</sub>-regularized logistic regression is used as the base classifier to counteract noise and train on a large dataset. Computational results show that the proposed Neglog method outperforms pure random sampling method with sound biological interpretability. In addition, we find that independent test on negative data is indispensable for bias control, which is usually neglected by existing studies. Lastly, we use the Neglog method to validate the PPIs in STRING, which are supported by gene ontology (GO) enrichment analyses.
first_indexed 2024-04-13T05:54:46Z
format Article
id doaj.art-3280c49ec1374afbabb77b87f8f67165
institution Directory Open Access Journal
issn 1422-0067
language English
last_indexed 2024-04-13T05:54:46Z
publishDate 2019-10-01
publisher MDPI AG
record_format Article
series International Journal of Molecular Sciences
spelling doaj.art-3280c49ec1374afbabb77b87f8f671652022-12-22T02:59:39ZengMDPI AGInternational Journal of Molecular Sciences1422-00672019-10-012020507510.3390/ijms20205075ijms20205075Neglog: Homology-Based Negative Data Sampling Method for Genome-Scale Reconstruction of Human Protein–Protein Interaction NetworksSuyu Mei0Kun Zhang1Software College, Shenyang Normal University, Shenyang 110034, ChinaBioinformatics Facility of Xavier NIH RCMI Cancer Research Center, Department of Computer Science, Xavier University of Louisiana, New Orleans, LA 70125, USARapid reconstruction of genome-scale protein–protein interaction (PPI) networks is instrumental in understanding the cellular processes and disease pathogenesis and drug reactions. However, lack of experimentally verified negative data (i.e., pairs of proteins that do not interact) is still a major issue that needs to be properly addressed in computational modeling. In this study, we take advantage of the very limited experimentally verified negative data from Negatome to infer more negative data for computational modeling. We assume that the paralogs or orthologs of two non-interacting proteins also do not interact with high probability. We coin an assumption as “Neglog” this assumption is to some extent supported by paralogous/orthologous structure conservation. To reduce the risk of bias toward the negative data from Negatome, we combine Neglog with less biased random sampling according to a certain ratio to construct training data. L<sub>2</sub>-regularized logistic regression is used as the base classifier to counteract noise and train on a large dataset. Computational results show that the proposed Neglog method outperforms pure random sampling method with sound biological interpretability. In addition, we find that independent test on negative data is indispensable for bias control, which is usually neglected by existing studies. Lastly, we use the Neglog method to validate the PPIs in STRING, which are supported by gene ontology (GO) enrichment analyses.https://www.mdpi.com/1422-0067/20/20/5075protein–protein interactionparalog/ortholognegative data samplingmachine learningl<sub>2</sub>-regularized logistic regression
spellingShingle Suyu Mei
Kun Zhang
Neglog: Homology-Based Negative Data Sampling Method for Genome-Scale Reconstruction of Human Protein–Protein Interaction Networks
International Journal of Molecular Sciences
protein–protein interaction
paralog/ortholog
negative data sampling
machine learning
l<sub>2</sub>-regularized logistic regression
title Neglog: Homology-Based Negative Data Sampling Method for Genome-Scale Reconstruction of Human Protein–Protein Interaction Networks
title_full Neglog: Homology-Based Negative Data Sampling Method for Genome-Scale Reconstruction of Human Protein–Protein Interaction Networks
title_fullStr Neglog: Homology-Based Negative Data Sampling Method for Genome-Scale Reconstruction of Human Protein–Protein Interaction Networks
title_full_unstemmed Neglog: Homology-Based Negative Data Sampling Method for Genome-Scale Reconstruction of Human Protein–Protein Interaction Networks
title_short Neglog: Homology-Based Negative Data Sampling Method for Genome-Scale Reconstruction of Human Protein–Protein Interaction Networks
title_sort neglog homology based negative data sampling method for genome scale reconstruction of human protein protein interaction networks
topic protein–protein interaction
paralog/ortholog
negative data sampling
machine learning
l<sub>2</sub>-regularized logistic regression
url https://www.mdpi.com/1422-0067/20/20/5075
work_keys_str_mv AT suyumei negloghomologybasednegativedatasamplingmethodforgenomescalereconstructionofhumanproteinproteininteractionnetworks
AT kunzhang negloghomologybasednegativedatasamplingmethodforgenomescalereconstructionofhumanproteinproteininteractionnetworks