Benchmark Evaluation of Protein–Protein Interaction Prediction Algorithms

Protein–protein interactions (PPIs) perform various functions and regulate processes throughout cells. Knowledge of the full network of PPIs is vital to biomedical research, but most of the PPIs are still unknown. As it is infeasible to discover all of them experimentally due to technical and resour...

Full description

Bibliographic Details
Main Authors: Brandan Dunham, Madhavi K. Ganapathiraju
Format: Article
Language:English
Published: MDPI AG 2021-12-01
Series:Molecules
Subjects:
Online Access:https://www.mdpi.com/1420-3049/27/1/41
_version_ 1797498257356619776
author Brandan Dunham
Madhavi K. Ganapathiraju
author_facet Brandan Dunham
Madhavi K. Ganapathiraju
author_sort Brandan Dunham
collection DOAJ
description Protein–protein interactions (PPIs) perform various functions and regulate processes throughout cells. Knowledge of the full network of PPIs is vital to biomedical research, but most of the PPIs are still unknown. As it is infeasible to discover all of them experimentally due to technical and resource limitations, computational prediction of PPIs is essential and accurately assessing the performance of algorithms is required before further application or translation. However, many published methods compose their evaluation datasets incorrectly, using a higher proportion of positive class data than occuring naturally, leading to exaggerated performance. We re-implemented various published algorithms and evaluated them on datasets with realistic data compositions and found that their performance is overstated in original publications; with several methods outperformed by our control models built on ‘illogical’ and random number features. We conclude that these methods are influenced by an over-characterization of some proteins in the literature and due to scale-free nature of PPI network and that they fail when tested on all possible protein pairs. Additionally, we found that sequence-only-based algorithms performed worse than those that employ functional and expression features. We present a benchmark evaluation of many published algorithms for PPI prediction. The source code of our implementations and the benchmark datasets created here are made available in open source.
first_indexed 2024-03-10T03:30:53Z
format Article
id doaj.art-6e750bf14ced4365a03f0dc84a445578
institution Directory Open Access Journal
issn 1420-3049
language English
last_indexed 2024-03-10T03:30:53Z
publishDate 2021-12-01
publisher MDPI AG
record_format Article
series Molecules
spelling doaj.art-6e750bf14ced4365a03f0dc84a4455782023-11-23T11:55:58ZengMDPI AGMolecules1420-30492021-12-012714110.3390/molecules27010041Benchmark Evaluation of Protein–Protein Interaction Prediction AlgorithmsBrandan Dunham0Madhavi K. Ganapathiraju1Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15232, USADepartment of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15232, USAProtein–protein interactions (PPIs) perform various functions and regulate processes throughout cells. Knowledge of the full network of PPIs is vital to biomedical research, but most of the PPIs are still unknown. As it is infeasible to discover all of them experimentally due to technical and resource limitations, computational prediction of PPIs is essential and accurately assessing the performance of algorithms is required before further application or translation. However, many published methods compose their evaluation datasets incorrectly, using a higher proportion of positive class data than occuring naturally, leading to exaggerated performance. We re-implemented various published algorithms and evaluated them on datasets with realistic data compositions and found that their performance is overstated in original publications; with several methods outperformed by our control models built on ‘illogical’ and random number features. We conclude that these methods are influenced by an over-characterization of some proteins in the literature and due to scale-free nature of PPI network and that they fail when tested on all possible protein pairs. Additionally, we found that sequence-only-based algorithms performed worse than those that employ functional and expression features. We present a benchmark evaluation of many published algorithms for PPI prediction. The source code of our implementations and the benchmark datasets created here are made available in open source.https://www.mdpi.com/1420-3049/27/1/41protein–protein interactionscomputational predictionevaluationinteractome
spellingShingle Brandan Dunham
Madhavi K. Ganapathiraju
Benchmark Evaluation of Protein–Protein Interaction Prediction Algorithms
Molecules
protein–protein interactions
computational prediction
evaluation
interactome
title Benchmark Evaluation of Protein–Protein Interaction Prediction Algorithms
title_full Benchmark Evaluation of Protein–Protein Interaction Prediction Algorithms
title_fullStr Benchmark Evaluation of Protein–Protein Interaction Prediction Algorithms
title_full_unstemmed Benchmark Evaluation of Protein–Protein Interaction Prediction Algorithms
title_short Benchmark Evaluation of Protein–Protein Interaction Prediction Algorithms
title_sort benchmark evaluation of protein protein interaction prediction algorithms
topic protein–protein interactions
computational prediction
evaluation
interactome
url https://www.mdpi.com/1420-3049/27/1/41
work_keys_str_mv AT brandandunham benchmarkevaluationofproteinproteininteractionpredictionalgorithms
AT madhavikganapathiraju benchmarkevaluationofproteinproteininteractionpredictionalgorithms