Limits of Prediction for Machine Learning in Drug Discovery

In drug discovery, molecules are optimized towards desired properties. In this context, machine learning is used for extrapolation in drug discovery projects. The limits of extrapolation for regression models are known. However, a systematic analysis of the effectiveness of extrapolation in drug dis...

Full description

Bibliographic Details
Main Authors: Modest von Korff, Thomas Sander
Format: Article
Language:English
Published: Frontiers Media S.A. 2022-03-01
Series:Frontiers in Pharmacology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fphar.2022.832120/full
_version_ 1819104517865603072
author Modest von Korff
Thomas Sander
author_facet Modest von Korff
Thomas Sander
author_sort Modest von Korff
collection DOAJ
description In drug discovery, molecules are optimized towards desired properties. In this context, machine learning is used for extrapolation in drug discovery projects. The limits of extrapolation for regression models are known. However, a systematic analysis of the effectiveness of extrapolation in drug discovery has not yet been performed. In response, this study examined the capabilities of six machine learning algorithms to extrapolate from 243 datasets. The response values calculated from the molecules in the datasets were molecular weight, cLogP, and the number of sp3-atoms. Three experimental set ups were chosen for response values. Shuffled data were used for interpolation, whereas data for extrapolation were sorted from high to low values, and the reverse. Extrapolation with sorted data resulted in much larger prediction errors than extrapolation with shuffled data. Additionally, this study demonstrated that linear machine learning methods are preferable for extrapolation.
first_indexed 2024-12-22T02:07:37Z
format Article
id doaj.art-44bd4713f73f48beafaa5ea79fbbba05
institution Directory Open Access Journal
issn 1663-9812
language English
last_indexed 2024-12-22T02:07:37Z
publishDate 2022-03-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Pharmacology
spelling doaj.art-44bd4713f73f48beafaa5ea79fbbba052022-12-21T18:42:29ZengFrontiers Media S.A.Frontiers in Pharmacology1663-98122022-03-011310.3389/fphar.2022.832120832120Limits of Prediction for Machine Learning in Drug DiscoveryModest von KorffThomas SanderIn drug discovery, molecules are optimized towards desired properties. In this context, machine learning is used for extrapolation in drug discovery projects. The limits of extrapolation for regression models are known. However, a systematic analysis of the effectiveness of extrapolation in drug discovery has not yet been performed. In response, this study examined the capabilities of six machine learning algorithms to extrapolate from 243 datasets. The response values calculated from the molecules in the datasets were molecular weight, cLogP, and the number of sp3-atoms. Three experimental set ups were chosen for response values. Shuffled data were used for interpolation, whereas data for extrapolation were sorted from high to low values, and the reverse. Extrapolation with sorted data resulted in much larger prediction errors than extrapolation with shuffled data. Additionally, this study demonstrated that linear machine learning methods are preferable for extrapolation.https://www.frontiersin.org/articles/10.3389/fphar.2022.832120/fullmachine learningdrug discoveryextrapolationdata setPLS (partial least square)Gaussian regression
spellingShingle Modest von Korff
Thomas Sander
Limits of Prediction for Machine Learning in Drug Discovery
Frontiers in Pharmacology
machine learning
drug discovery
extrapolation
data set
PLS (partial least square)
Gaussian regression
title Limits of Prediction for Machine Learning in Drug Discovery
title_full Limits of Prediction for Machine Learning in Drug Discovery
title_fullStr Limits of Prediction for Machine Learning in Drug Discovery
title_full_unstemmed Limits of Prediction for Machine Learning in Drug Discovery
title_short Limits of Prediction for Machine Learning in Drug Discovery
title_sort limits of prediction for machine learning in drug discovery
topic machine learning
drug discovery
extrapolation
data set
PLS (partial least square)
Gaussian regression
url https://www.frontiersin.org/articles/10.3389/fphar.2022.832120/full
work_keys_str_mv AT modestvonkorff limitsofpredictionformachinelearningindrugdiscovery
AT thomassander limitsofpredictionformachinelearningindrugdiscovery