Exploring QSAR models for activity-cliff prediction

Abstract Introduction and methodology Pairs of similar compounds that only differ by a small structural modification but exhibit a large difference in their binding affinity for a given target are known as activity cliffs (ACs). It has been hypothesised that QSAR models struggle to predict ACs and t...

Full description

Bibliographic Details
Main Authors:	Markus Dablander, Thierry Hanser, Renaud Lambiotte, Garrett M. Morris
Format:	Article
Language:	English
Published:	BMC 2023-04-01
Series:	Journal of Cheminformatics
Subjects:	QSAR modelling Activity cliffs Activity cliff prediction Machine learning Deep learning Molecular representation
Online Access:	https://doi.org/10.1186/s13321-023-00708-w

_version_	1797840859703541760
author	Markus Dablander Thierry Hanser Renaud Lambiotte Garrett M. Morris
author_facet	Markus Dablander Thierry Hanser Renaud Lambiotte Garrett M. Morris
author_sort	Markus Dablander
collection	DOAJ
description	Abstract Introduction and methodology Pairs of similar compounds that only differ by a small structural modification but exhibit a large difference in their binding affinity for a given target are known as activity cliffs (ACs). It has been hypothesised that QSAR models struggle to predict ACs and that ACs thus form a major source of prediction error. However, the AC-prediction power of modern QSAR methods and its quantitative relationship to general QSAR-prediction performance is still underexplored. We systematically construct nine distinct QSAR models by combining three molecular representation methods (extended-connectivity fingerprints, physicochemical-descriptor vectors and graph isomorphism networks) with three regression techniques (random forests, k-nearest neighbours and multilayer perceptrons); we then use each resulting model to classify pairs of similar compounds as ACs or non-ACs and to predict the activities of individual molecules in three case studies: dopamine receptor D2, factor Xa, and SARS-CoV-2 main protease. Results and conclusions Our results provide strong support for the hypothesis that indeed QSAR models frequently fail to predict ACs. We observe low AC-sensitivity amongst the evaluated models when the activities of both compounds are unknown, but a substantial increase in AC-sensitivity when the actual activity of one of the compounds is given. Graph isomorphism features are found to be competitive with or superior to classical molecular representations for AC-classification and can thus be employed as baseline AC-prediction models or simple compound-optimisation tools. For general QSAR-prediction, however, extended-connectivity fingerprints still consistently deliver the best performance amongs the tested input representations. A potential future pathway to improve QSAR-modelling performance might be the development of techniques to increase AC-sensitivity. Graphical Abstract
first_indexed	2024-04-09T16:21:37Z
format	Article
id	doaj.art-76f3ade697194633a3b30a6242ae3645
institution	Directory Open Access Journal
issn	1758-2946
language	English
last_indexed	2024-04-09T16:21:37Z
publishDate	2023-04-01
publisher	BMC
record_format	Article
series	Journal of Cheminformatics
spelling	doaj.art-76f3ade697194633a3b30a6242ae36452023-04-23T11:26:37ZengBMCJournal of Cheminformatics1758-29462023-04-0115111610.1186/s13321-023-00708-wExploring QSAR models for activity-cliff predictionMarkus Dablander0Thierry Hanser1Renaud Lambiotte2Garrett M. Morris3Mathematical Institute, University of OxfordLhasa LimitedMathematical Institute, University of OxfordDepartment of Statistics, University of OxfordAbstract Introduction and methodology Pairs of similar compounds that only differ by a small structural modification but exhibit a large difference in their binding affinity for a given target are known as activity cliffs (ACs). It has been hypothesised that QSAR models struggle to predict ACs and that ACs thus form a major source of prediction error. However, the AC-prediction power of modern QSAR methods and its quantitative relationship to general QSAR-prediction performance is still underexplored. We systematically construct nine distinct QSAR models by combining three molecular representation methods (extended-connectivity fingerprints, physicochemical-descriptor vectors and graph isomorphism networks) with three regression techniques (random forests, k-nearest neighbours and multilayer perceptrons); we then use each resulting model to classify pairs of similar compounds as ACs or non-ACs and to predict the activities of individual molecules in three case studies: dopamine receptor D2, factor Xa, and SARS-CoV-2 main protease. Results and conclusions Our results provide strong support for the hypothesis that indeed QSAR models frequently fail to predict ACs. We observe low AC-sensitivity amongst the evaluated models when the activities of both compounds are unknown, but a substantial increase in AC-sensitivity when the actual activity of one of the compounds is given. Graph isomorphism features are found to be competitive with or superior to classical molecular representations for AC-classification and can thus be employed as baseline AC-prediction models or simple compound-optimisation tools. For general QSAR-prediction, however, extended-connectivity fingerprints still consistently deliver the best performance amongs the tested input representations. A potential future pathway to improve QSAR-modelling performance might be the development of techniques to increase AC-sensitivity. Graphical Abstracthttps://doi.org/10.1186/s13321-023-00708-wQSAR modellingActivity cliffsActivity cliff predictionMachine learningDeep learningMolecular representation
spellingShingle	Markus Dablander Thierry Hanser Renaud Lambiotte Garrett M. Morris Exploring QSAR models for activity-cliff prediction Journal of Cheminformatics QSAR modelling Activity cliffs Activity cliff prediction Machine learning Deep learning Molecular representation
title	Exploring QSAR models for activity-cliff prediction
title_full	Exploring QSAR models for activity-cliff prediction
title_fullStr	Exploring QSAR models for activity-cliff prediction
title_full_unstemmed	Exploring QSAR models for activity-cliff prediction
title_short	Exploring QSAR models for activity-cliff prediction
title_sort	exploring qsar models for activity cliff prediction
topic	QSAR modelling Activity cliffs Activity cliff prediction Machine learning Deep learning Molecular representation
url	https://doi.org/10.1186/s13321-023-00708-w
work_keys_str_mv	AT markusdablander exploringqsarmodelsforactivitycliffprediction AT thierryhanser exploringqsarmodelsforactivitycliffprediction AT renaudlambiotte exploringqsarmodelsforactivitycliffprediction AT garrettmmorris exploringqsarmodelsforactivitycliffprediction

Exploring QSAR models for activity-cliff prediction

Similar Items