Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest

Docking scoring functions can be used to predict the strength of protein-ligand binding. It is widely believed that training a scoring function with low-quality data is detrimental for its predictive performance. Nevertheless, there is a surprising lack of systematic validation experiments in suppor...

Full description

Bibliographic Details
Main Authors: Hongjian Li, Kwong-Sak Leung, Man-Hon Wong, Pedro J. Ballester
Format: Article
Language:English
Published: MDPI AG 2015-06-01
Series:Molecules
Subjects:
Online Access:http://www.mdpi.com/1420-3049/20/6/10947
_version_ 1819134312369356800
author Hongjian Li
Kwong-Sak Leung
Man-Hon Wong
Pedro J. Ballester
author_facet Hongjian Li
Kwong-Sak Leung
Man-Hon Wong
Pedro J. Ballester
author_sort Hongjian Li
collection DOAJ
description Docking scoring functions can be used to predict the strength of protein-ligand binding. It is widely believed that training a scoring function with low-quality data is detrimental for its predictive performance. Nevertheless, there is a surprising lack of systematic validation experiments in support of this hypothesis. In this study, we investigated to which extent training a scoring function with data containing low-quality structural and binding data is detrimental for predictive performance. We actually found that low-quality data is not only non-detrimental, but beneficial for the predictive performance of machine-learning scoring functions, though the improvement is less important than that coming from high-quality data. Furthermore, we observed that classical scoring functions are not able to effectively exploit data beyond an early threshold, regardless of its quality. This demonstrates that exploiting a larger data volume is more important for the performance of machine-learning scoring functions than restricting to a smaller set of higher data quality.
first_indexed 2024-12-22T10:01:11Z
format Article
id doaj.art-4bd53821904b45d4a55b8d0e9a1dbe25
institution Directory Open Access Journal
issn 1420-3049
language English
last_indexed 2024-12-22T10:01:11Z
publishDate 2015-06-01
publisher MDPI AG
record_format Article
series Molecules
spelling doaj.art-4bd53821904b45d4a55b8d0e9a1dbe252022-12-21T18:30:06ZengMDPI AGMolecules1420-30492015-06-01206109471096210.3390/molecules200610947molecules200610947Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random ForestHongjian Li0Kwong-Sak Leung1Man-Hon Wong2Pedro J. Ballester3Department of Computer Science and Engineering, Chinese University of Hong Kong, Sha Tin, New Territories 999077, Hong KongDepartment of Computer Science and Engineering, Chinese University of Hong Kong, Sha Tin, New Territories 999077, Hong KongDepartment of Computer Science and Engineering, Chinese University of Hong Kong, Sha Tin, New Territories 999077, Hong KongCancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, FranceDocking scoring functions can be used to predict the strength of protein-ligand binding. It is widely believed that training a scoring function with low-quality data is detrimental for its predictive performance. Nevertheless, there is a surprising lack of systematic validation experiments in support of this hypothesis. In this study, we investigated to which extent training a scoring function with data containing low-quality structural and binding data is detrimental for predictive performance. We actually found that low-quality data is not only non-detrimental, but beneficial for the predictive performance of machine-learning scoring functions, though the improvement is less important than that coming from high-quality data. Furthermore, we observed that classical scoring functions are not able to effectively exploit data beyond an early threshold, regardless of its quality. This demonstrates that exploiting a larger data volume is more important for the performance of machine-learning scoring functions than restricting to a smaller set of higher data quality.http://www.mdpi.com/1420-3049/20/6/10947dockingbinding affinity predictionmachine-learning scoring functions
spellingShingle Hongjian Li
Kwong-Sak Leung
Man-Hon Wong
Pedro J. Ballester
Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest
Molecules
docking
binding affinity prediction
machine-learning scoring functions
title Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest
title_full Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest
title_fullStr Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest
title_full_unstemmed Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest
title_short Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest
title_sort low quality structural and interaction data improves binding affinity prediction via random forest
topic docking
binding affinity prediction
machine-learning scoring functions
url http://www.mdpi.com/1420-3049/20/6/10947
work_keys_str_mv AT hongjianli lowqualitystructuralandinteractiondataimprovesbindingaffinitypredictionviarandomforest
AT kwongsakleung lowqualitystructuralandinteractiondataimprovesbindingaffinitypredictionviarandomforest
AT manhonwong lowqualitystructuralandinteractiondataimprovesbindingaffinitypredictionviarandomforest
AT pedrojballester lowqualitystructuralandinteractiondataimprovesbindingaffinitypredictionviarandomforest