Performance and robustness of small molecule retention time prediction with molecular graph neural networks in industrial drug discovery campaigns

Abstract This study explores how machine-learning can be used to predict chromatographic retention times (RT) for the analysis of small molecules, with the objective of identifying a machine-learning framework with the robustness required to support a chemical synthesis production platform. We used...

Full description

Bibliographic Details
Main Authors: Daniel Vik, David Pii, Chirag Mudaliar, Mads Nørregaard-Madsen, Aleksejs Kontijevskis
Format: Article
Language:English
Published: Nature Portfolio 2024-04-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-024-59620-4
_version_ 1797199475949699072
author Daniel Vik
David Pii
Chirag Mudaliar
Mads Nørregaard-Madsen
Aleksejs Kontijevskis
author_facet Daniel Vik
David Pii
Chirag Mudaliar
Mads Nørregaard-Madsen
Aleksejs Kontijevskis
author_sort Daniel Vik
collection DOAJ
description Abstract This study explores how machine-learning can be used to predict chromatographic retention times (RT) for the analysis of small molecules, with the objective of identifying a machine-learning framework with the robustness required to support a chemical synthesis production platform. We used internally generated data from high-throughput parallel synthesis in context of pharmaceutical drug discovery projects. We tested machine-learning models from the following frameworks: XGBoost, ChemProp, and DeepChem, using a dataset of 7552 small molecules. Our findings show that two specific models, AttentiveFP and ChemProp, performed better than XGBoost and a regular neural network in predicting RT accurately. We also assessed how well these models performed over time and found that molecular graph neural networks consistently gave accurate predictions for new chemical series. In addition, when we applied ChemProp on the publicly available METLIN SMRT dataset, it performed impressively with an average error of 38.70 s. These results highlight the efficacy of molecular graph neural networks, especially ChemProp, in diverse RT prediction scenarios, thereby enhancing the efficiency of chromatographic analysis.
first_indexed 2024-04-24T07:16:21Z
format Article
id doaj.art-7809f1803283419293563e65a4899082
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-04-24T07:16:21Z
publishDate 2024-04-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-7809f1803283419293563e65a48990822024-04-21T11:18:27ZengNature PortfolioScientific Reports2045-23222024-04-011411810.1038/s41598-024-59620-4Performance and robustness of small molecule retention time prediction with molecular graph neural networks in industrial drug discovery campaignsDaniel Vik0David Pii1Chirag Mudaliar2Mads Nørregaard-Madsen3Aleksejs Kontijevskis4Amgen Research Copenhagen, Amgen Inc.Amgen Research Copenhagen, Amgen Inc.Amgen Research Copenhagen, Amgen Inc.Amgen Research Copenhagen, Amgen Inc.Amgen Research Copenhagen, Amgen Inc.Abstract This study explores how machine-learning can be used to predict chromatographic retention times (RT) for the analysis of small molecules, with the objective of identifying a machine-learning framework with the robustness required to support a chemical synthesis production platform. We used internally generated data from high-throughput parallel synthesis in context of pharmaceutical drug discovery projects. We tested machine-learning models from the following frameworks: XGBoost, ChemProp, and DeepChem, using a dataset of 7552 small molecules. Our findings show that two specific models, AttentiveFP and ChemProp, performed better than XGBoost and a regular neural network in predicting RT accurately. We also assessed how well these models performed over time and found that molecular graph neural networks consistently gave accurate predictions for new chemical series. In addition, when we applied ChemProp on the publicly available METLIN SMRT dataset, it performed impressively with an average error of 38.70 s. These results highlight the efficacy of molecular graph neural networks, especially ChemProp, in diverse RT prediction scenarios, thereby enhancing the efficiency of chromatographic analysis.https://doi.org/10.1038/s41598-024-59620-4ChromatographyMachine-learningRetention timeSmall moleculeApplied artificial intelligencePharmaceuticals
spellingShingle Daniel Vik
David Pii
Chirag Mudaliar
Mads Nørregaard-Madsen
Aleksejs Kontijevskis
Performance and robustness of small molecule retention time prediction with molecular graph neural networks in industrial drug discovery campaigns
Scientific Reports
Chromatography
Machine-learning
Retention time
Small molecule
Applied artificial intelligence
Pharmaceuticals
title Performance and robustness of small molecule retention time prediction with molecular graph neural networks in industrial drug discovery campaigns
title_full Performance and robustness of small molecule retention time prediction with molecular graph neural networks in industrial drug discovery campaigns
title_fullStr Performance and robustness of small molecule retention time prediction with molecular graph neural networks in industrial drug discovery campaigns
title_full_unstemmed Performance and robustness of small molecule retention time prediction with molecular graph neural networks in industrial drug discovery campaigns
title_short Performance and robustness of small molecule retention time prediction with molecular graph neural networks in industrial drug discovery campaigns
title_sort performance and robustness of small molecule retention time prediction with molecular graph neural networks in industrial drug discovery campaigns
topic Chromatography
Machine-learning
Retention time
Small molecule
Applied artificial intelligence
Pharmaceuticals
url https://doi.org/10.1038/s41598-024-59620-4
work_keys_str_mv AT danielvik performanceandrobustnessofsmallmoleculeretentiontimepredictionwithmoleculargraphneuralnetworksinindustrialdrugdiscoverycampaigns
AT davidpii performanceandrobustnessofsmallmoleculeretentiontimepredictionwithmoleculargraphneuralnetworksinindustrialdrugdiscoverycampaigns
AT chiragmudaliar performanceandrobustnessofsmallmoleculeretentiontimepredictionwithmoleculargraphneuralnetworksinindustrialdrugdiscoverycampaigns
AT madsnørregaardmadsen performanceandrobustnessofsmallmoleculeretentiontimepredictionwithmoleculargraphneuralnetworksinindustrialdrugdiscoverycampaigns
AT aleksejskontijevskis performanceandrobustnessofsmallmoleculeretentiontimepredictionwithmoleculargraphneuralnetworksinindustrialdrugdiscoverycampaigns