Benchmark Evaluation of Protein–Protein Interaction Prediction Algorithms
Protein–protein interactions (PPIs) perform various functions and regulate processes throughout cells. Knowledge of the full network of PPIs is vital to biomedical research, but most of the PPIs are still unknown. As it is infeasible to discover all of them experimentally due to technical and resour...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-12-01
|
Series: | Molecules |
Subjects: | |
Online Access: | https://www.mdpi.com/1420-3049/27/1/41 |
_version_ | 1827668135955660800 |
---|---|
author | Brandan Dunham Madhavi K. Ganapathiraju |
author_facet | Brandan Dunham Madhavi K. Ganapathiraju |
author_sort | Brandan Dunham |
collection | DOAJ |
description | Protein–protein interactions (PPIs) perform various functions and regulate processes throughout cells. Knowledge of the full network of PPIs is vital to biomedical research, but most of the PPIs are still unknown. As it is infeasible to discover all of them experimentally due to technical and resource limitations, computational prediction of PPIs is essential and accurately assessing the performance of algorithms is required before further application or translation. However, many published methods compose their evaluation datasets incorrectly, using a higher proportion of positive class data than occuring naturally, leading to exaggerated performance. We re-implemented various published algorithms and evaluated them on datasets with realistic data compositions and found that their performance is overstated in original publications; with several methods outperformed by our control models built on ‘illogical’ and random number features. We conclude that these methods are influenced by an over-characterization of some proteins in the literature and due to scale-free nature of PPI network and that they fail when tested on all possible protein pairs. Additionally, we found that sequence-only-based algorithms performed worse than those that employ functional and expression features. We present a benchmark evaluation of many published algorithms for PPI prediction. The source code of our implementations and the benchmark datasets created here are made available in open source. |
first_indexed | 2024-03-10T03:30:53Z |
format | Article |
id | doaj.art-6e750bf14ced4365a03f0dc84a445578 |
institution | Directory Open Access Journal |
issn | 1420-3049 |
language | English |
last_indexed | 2024-03-10T03:30:53Z |
publishDate | 2021-12-01 |
publisher | MDPI AG |
record_format | Article |
series | Molecules |
spelling | doaj.art-6e750bf14ced4365a03f0dc84a4455782023-11-23T11:55:58ZengMDPI AGMolecules1420-30492021-12-012714110.3390/molecules27010041Benchmark Evaluation of Protein–Protein Interaction Prediction AlgorithmsBrandan Dunham0Madhavi K. Ganapathiraju1Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15232, USADepartment of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15232, USAProtein–protein interactions (PPIs) perform various functions and regulate processes throughout cells. Knowledge of the full network of PPIs is vital to biomedical research, but most of the PPIs are still unknown. As it is infeasible to discover all of them experimentally due to technical and resource limitations, computational prediction of PPIs is essential and accurately assessing the performance of algorithms is required before further application or translation. However, many published methods compose their evaluation datasets incorrectly, using a higher proportion of positive class data than occuring naturally, leading to exaggerated performance. We re-implemented various published algorithms and evaluated them on datasets with realistic data compositions and found that their performance is overstated in original publications; with several methods outperformed by our control models built on ‘illogical’ and random number features. We conclude that these methods are influenced by an over-characterization of some proteins in the literature and due to scale-free nature of PPI network and that they fail when tested on all possible protein pairs. Additionally, we found that sequence-only-based algorithms performed worse than those that employ functional and expression features. We present a benchmark evaluation of many published algorithms for PPI prediction. The source code of our implementations and the benchmark datasets created here are made available in open source.https://www.mdpi.com/1420-3049/27/1/41protein–protein interactionscomputational predictionevaluationinteractome |
spellingShingle | Brandan Dunham Madhavi K. Ganapathiraju Benchmark Evaluation of Protein–Protein Interaction Prediction Algorithms Molecules protein–protein interactions computational prediction evaluation interactome |
title | Benchmark Evaluation of Protein–Protein Interaction Prediction Algorithms |
title_full | Benchmark Evaluation of Protein–Protein Interaction Prediction Algorithms |
title_fullStr | Benchmark Evaluation of Protein–Protein Interaction Prediction Algorithms |
title_full_unstemmed | Benchmark Evaluation of Protein–Protein Interaction Prediction Algorithms |
title_short | Benchmark Evaluation of Protein–Protein Interaction Prediction Algorithms |
title_sort | benchmark evaluation of protein protein interaction prediction algorithms |
topic | protein–protein interactions computational prediction evaluation interactome |
url | https://www.mdpi.com/1420-3049/27/1/41 |
work_keys_str_mv | AT brandandunham benchmarkevaluationofproteinproteininteractionpredictionalgorithms AT madhavikganapathiraju benchmarkevaluationofproteinproteininteractionpredictionalgorithms |