Ensemble learning from ensemble docking: revisiting the optimum ensemble size problem

Abstract Despite considerable advances obtained by applying machine learning approaches in protein–ligand affinity predictions, the incorporation of receptor flexibility has remained an important bottleneck. While ensemble docking has been used widely as a solution to this problem, the optimum choic...

Full description

Bibliographic Details
Main Authors:	Sara Mohammadi, Zahra Narimani, Mitra Ashouri, Rohoullah Firouzi, Mohammad Hossein Karimi‐Jafari
Format:	Article
Language:	English
Published:	Nature Portfolio 2022-01-01
Series:	Scientific Reports
Online Access:	https://doi.org/10.1038/s41598-021-04448-5

_version_	1798026497520304128
author	Sara Mohammadi Zahra Narimani Mitra Ashouri Rohoullah Firouzi Mohammad Hossein Karimi‐Jafari
author_facet	Sara Mohammadi Zahra Narimani Mitra Ashouri Rohoullah Firouzi Mohammad Hossein Karimi‐Jafari
author_sort	Sara Mohammadi
collection	DOAJ
description	Abstract Despite considerable advances obtained by applying machine learning approaches in protein–ligand affinity predictions, the incorporation of receptor flexibility has remained an important bottleneck. While ensemble docking has been used widely as a solution to this problem, the optimum choice of receptor conformations is still an open question considering the issues related to the computational cost and false positive pose predictions. Here, a combination of ensemble learning and ensemble docking is suggested to rank different conformations of the target protein in light of their importance for the final accuracy of the model. Available X-ray structures of cyclin-dependent kinase 2 (CDK2) in complex with different ligands are used as an initial receptor ensemble, and its redundancy is removed through a graph-based redundancy removal, which is shown to be more efficient and less subjective than clustering-based representative selection methods. A set of ligands with available experimental affinity are docked to this nonredundant receptor ensemble, and the energetic features of the best scored poses are used in an ensemble learning procedure based on the random forest method. The importance of receptors is obtained through feature selection measures, and it is shown that a few of the most important conformations are sufficient to reach 1 kcal/mol accuracy in affinity prediction with considerable improvement of the early enrichment power of the models compared to the different ensemble docking without learning strategies. A clear strategy has been provided in which machine learning selects the most important experimental conformers of the receptor among a large set of protein–ligand complexes while simultaneously maintaining the final accuracy of affinity predictions at the highest level possible for available data. Our results could be informative for future attempts to design receptor-specific docking-rescoring strategies.
first_indexed	2024-04-11T18:37:31Z
format	Article
id	doaj.art-630ce00154a046bc854b23d2164224d3
institution	Directory Open Access Journal
issn	2045-2322
language	English
last_indexed	2024-04-11T18:37:31Z
publishDate	2022-01-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj.art-630ce00154a046bc854b23d2164224d32022-12-22T04:09:13ZengNature PortfolioScientific Reports2045-23222022-01-0112111510.1038/s41598-021-04448-5Ensemble learning from ensemble docking: revisiting the optimum ensemble size problemSara Mohammadi0Zahra Narimani1Mitra Ashouri2Rohoullah Firouzi3Mohammad Hossein Karimi‐Jafari4Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of TehranDepartment of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS)Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of TehranDepartment of Physical Chemistry, Chemistry and Chemical Engineering Research Center of IranDepartment of Bioinformatics, Institute of Biochemistry and Biophysics, University of TehranAbstract Despite considerable advances obtained by applying machine learning approaches in protein–ligand affinity predictions, the incorporation of receptor flexibility has remained an important bottleneck. While ensemble docking has been used widely as a solution to this problem, the optimum choice of receptor conformations is still an open question considering the issues related to the computational cost and false positive pose predictions. Here, a combination of ensemble learning and ensemble docking is suggested to rank different conformations of the target protein in light of their importance for the final accuracy of the model. Available X-ray structures of cyclin-dependent kinase 2 (CDK2) in complex with different ligands are used as an initial receptor ensemble, and its redundancy is removed through a graph-based redundancy removal, which is shown to be more efficient and less subjective than clustering-based representative selection methods. A set of ligands with available experimental affinity are docked to this nonredundant receptor ensemble, and the energetic features of the best scored poses are used in an ensemble learning procedure based on the random forest method. The importance of receptors is obtained through feature selection measures, and it is shown that a few of the most important conformations are sufficient to reach 1 kcal/mol accuracy in affinity prediction with considerable improvement of the early enrichment power of the models compared to the different ensemble docking without learning strategies. A clear strategy has been provided in which machine learning selects the most important experimental conformers of the receptor among a large set of protein–ligand complexes while simultaneously maintaining the final accuracy of affinity predictions at the highest level possible for available data. Our results could be informative for future attempts to design receptor-specific docking-rescoring strategies.https://doi.org/10.1038/s41598-021-04448-5
spellingShingle	Sara Mohammadi Zahra Narimani Mitra Ashouri Rohoullah Firouzi Mohammad Hossein Karimi‐Jafari Ensemble learning from ensemble docking: revisiting the optimum ensemble size problem Scientific Reports
title	Ensemble learning from ensemble docking: revisiting the optimum ensemble size problem
title_full	Ensemble learning from ensemble docking: revisiting the optimum ensemble size problem
title_fullStr	Ensemble learning from ensemble docking: revisiting the optimum ensemble size problem
title_full_unstemmed	Ensemble learning from ensemble docking: revisiting the optimum ensemble size problem
title_short	Ensemble learning from ensemble docking: revisiting the optimum ensemble size problem
title_sort	ensemble learning from ensemble docking revisiting the optimum ensemble size problem
url	https://doi.org/10.1038/s41598-021-04448-5
work_keys_str_mv	AT saramohammadi ensemblelearningfromensembledockingrevisitingtheoptimumensemblesizeproblem AT zahranarimani ensemblelearningfromensembledockingrevisitingtheoptimumensemblesizeproblem AT mitraashouri ensemblelearningfromensembledockingrevisitingtheoptimumensemblesizeproblem AT rohoullahfirouzi ensemblelearningfromensembledockingrevisitingtheoptimumensemblesizeproblem AT mohammadhosseinkarimijafari ensemblelearningfromensembledockingrevisitingtheoptimumensemblesizeproblem

Ensemble learning from ensemble docking: revisiting the optimum ensemble size problem

Similar Items