Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest
Docking scoring functions can be used to predict the strength of protein-ligand binding. It is widely believed that training a scoring function with low-quality data is detrimental for its predictive performance. Nevertheless, there is a surprising lack of systematic validation experiments in suppor...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2015-06-01
|
Series: | Molecules |
Subjects: | |
Online Access: | http://www.mdpi.com/1420-3049/20/6/10947 |
_version_ | 1819134312369356800 |
---|---|
author | Hongjian Li Kwong-Sak Leung Man-Hon Wong Pedro J. Ballester |
author_facet | Hongjian Li Kwong-Sak Leung Man-Hon Wong Pedro J. Ballester |
author_sort | Hongjian Li |
collection | DOAJ |
description | Docking scoring functions can be used to predict the strength of protein-ligand binding. It is widely believed that training a scoring function with low-quality data is detrimental for its predictive performance. Nevertheless, there is a surprising lack of systematic validation experiments in support of this hypothesis. In this study, we investigated to which extent training a scoring function with data containing low-quality structural and binding data is detrimental for predictive performance. We actually found that low-quality data is not only non-detrimental, but beneficial for the predictive performance of machine-learning scoring functions, though the improvement is less important than that coming from high-quality data. Furthermore, we observed that classical scoring functions are not able to effectively exploit data beyond an early threshold, regardless of its quality. This demonstrates that exploiting a larger data volume is more important for the performance of machine-learning scoring functions than restricting to a smaller set of higher data quality. |
first_indexed | 2024-12-22T10:01:11Z |
format | Article |
id | doaj.art-4bd53821904b45d4a55b8d0e9a1dbe25 |
institution | Directory Open Access Journal |
issn | 1420-3049 |
language | English |
last_indexed | 2024-12-22T10:01:11Z |
publishDate | 2015-06-01 |
publisher | MDPI AG |
record_format | Article |
series | Molecules |
spelling | doaj.art-4bd53821904b45d4a55b8d0e9a1dbe252022-12-21T18:30:06ZengMDPI AGMolecules1420-30492015-06-01206109471096210.3390/molecules200610947molecules200610947Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random ForestHongjian Li0Kwong-Sak Leung1Man-Hon Wong2Pedro J. Ballester3Department of Computer Science and Engineering, Chinese University of Hong Kong, Sha Tin, New Territories 999077, Hong KongDepartment of Computer Science and Engineering, Chinese University of Hong Kong, Sha Tin, New Territories 999077, Hong KongDepartment of Computer Science and Engineering, Chinese University of Hong Kong, Sha Tin, New Territories 999077, Hong KongCancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, FranceDocking scoring functions can be used to predict the strength of protein-ligand binding. It is widely believed that training a scoring function with low-quality data is detrimental for its predictive performance. Nevertheless, there is a surprising lack of systematic validation experiments in support of this hypothesis. In this study, we investigated to which extent training a scoring function with data containing low-quality structural and binding data is detrimental for predictive performance. We actually found that low-quality data is not only non-detrimental, but beneficial for the predictive performance of machine-learning scoring functions, though the improvement is less important than that coming from high-quality data. Furthermore, we observed that classical scoring functions are not able to effectively exploit data beyond an early threshold, regardless of its quality. This demonstrates that exploiting a larger data volume is more important for the performance of machine-learning scoring functions than restricting to a smaller set of higher data quality.http://www.mdpi.com/1420-3049/20/6/10947dockingbinding affinity predictionmachine-learning scoring functions |
spellingShingle | Hongjian Li Kwong-Sak Leung Man-Hon Wong Pedro J. Ballester Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest Molecules docking binding affinity prediction machine-learning scoring functions |
title | Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest |
title_full | Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest |
title_fullStr | Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest |
title_full_unstemmed | Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest |
title_short | Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest |
title_sort | low quality structural and interaction data improves binding affinity prediction via random forest |
topic | docking binding affinity prediction machine-learning scoring functions |
url | http://www.mdpi.com/1420-3049/20/6/10947 |
work_keys_str_mv | AT hongjianli lowqualitystructuralandinteractiondataimprovesbindingaffinitypredictionviarandomforest AT kwongsakleung lowqualitystructuralandinteractiondataimprovesbindingaffinitypredictionviarandomforest AT manhonwong lowqualitystructuralandinteractiondataimprovesbindingaffinitypredictionviarandomforest AT pedrojballester lowqualitystructuralandinteractiondataimprovesbindingaffinitypredictionviarandomforest |