Rapid prediction of conformationally-dependent DFT-level descriptors using graph neural networks for carboxylic acids and alkyl amines

Data-driven reaction discovery and development is a growing field that relies on the use of molecular descriptors to capture key information about substrates, ligands, and targets. Broad adaptation of this strategy is hindered by the associated computational cost of descriptor calculation, especiall...

Full description

Bibliographic Details
Main Authors: Haas, Brittany C, Hardy, Melissa A, Sowndarya S. V., Shree, Adams, Keir, Coley, Connor W, Paton, Robert S, Sigman, Matthew S
Other Authors: Massachusetts Institute of Technology. Department of Chemical Engineering
Format: Article
Language:English
Published: Royal Society of Chemistry 2025
Online Access:https://hdl.handle.net/1721.1/158096
_version_ 1824458233494896640
author Haas, Brittany C
Hardy, Melissa A
Sowndarya S. V., Shree
Adams, Keir
Coley, Connor W
Paton, Robert S
Sigman, Matthew S
author2 Massachusetts Institute of Technology. Department of Chemical Engineering
author_facet Massachusetts Institute of Technology. Department of Chemical Engineering
Haas, Brittany C
Hardy, Melissa A
Sowndarya S. V., Shree
Adams, Keir
Coley, Connor W
Paton, Robert S
Sigman, Matthew S
author_sort Haas, Brittany C
collection MIT
description Data-driven reaction discovery and development is a growing field that relies on the use of molecular descriptors to capture key information about substrates, ligands, and targets. Broad adaptation of this strategy is hindered by the associated computational cost of descriptor calculation, especially when considering conformational flexibility. Descriptor libraries can be precomputed agnostic of application to reduce the computational burden of data-driven reaction development. However, as one often applies these models to evaluate novel hypothetical structures, it would be ideal to predict the descriptors of compounds on-the-fly. Herein, we report DFT-level descriptor libraries for conformational ensembles of 8528 carboxylic acids and 8172 alkyl amines towards this goal. Employing 2D and 3D graph neural network architectures trained on these libraries culminated in the development of predictive models for molecule-level descriptors, as well as the bond- and atom-level descriptors for the conserved reactive site (carboxylic acid or amine). The predictions were confirmed to be robust for an external validation set of medicinally-relevant carboxylic acids and alkyl amines. Additionally, a retrospective study correlating the rate of amide coupling reactions demonstrated the suitability of the predicted DFT-level descriptors for downstream applications. Ultimately, these models enable high-fidelity predictions for a vast number of potential substrates, greatly increasing accessibility to the field of data-driven reaction development.
first_indexed 2025-02-19T04:22:38Z
format Article
id mit-1721.1/158096
institution Massachusetts Institute of Technology
language English
last_indexed 2025-02-19T04:22:38Z
publishDate 2025
publisher Royal Society of Chemistry
record_format dspace
spelling mit-1721.1/1580962025-01-28T16:55:49Z Rapid prediction of conformationally-dependent DFT-level descriptors using graph neural networks for carboxylic acids and alkyl amines Haas, Brittany C Hardy, Melissa A Sowndarya S. V., Shree Adams, Keir Coley, Connor W Paton, Robert S Sigman, Matthew S Massachusetts Institute of Technology. Department of Chemical Engineering Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Data-driven reaction discovery and development is a growing field that relies on the use of molecular descriptors to capture key information about substrates, ligands, and targets. Broad adaptation of this strategy is hindered by the associated computational cost of descriptor calculation, especially when considering conformational flexibility. Descriptor libraries can be precomputed agnostic of application to reduce the computational burden of data-driven reaction development. However, as one often applies these models to evaluate novel hypothetical structures, it would be ideal to predict the descriptors of compounds on-the-fly. Herein, we report DFT-level descriptor libraries for conformational ensembles of 8528 carboxylic acids and 8172 alkyl amines towards this goal. Employing 2D and 3D graph neural network architectures trained on these libraries culminated in the development of predictive models for molecule-level descriptors, as well as the bond- and atom-level descriptors for the conserved reactive site (carboxylic acid or amine). The predictions were confirmed to be robust for an external validation set of medicinally-relevant carboxylic acids and alkyl amines. Additionally, a retrospective study correlating the rate of amide coupling reactions demonstrated the suitability of the predicted DFT-level descriptors for downstream applications. Ultimately, these models enable high-fidelity predictions for a vast number of potential substrates, greatly increasing accessibility to the field of data-driven reaction development. 2025-01-28T16:55:47Z 2025-01-28T16:55:47Z 2025-01-15 2025-01-28T16:49:06Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/158096 Haas, Brittany C, Hardy, Melissa A, Sowndarya S. V., Shree, Adams, Keir, Coley, Connor W et al. 2025. "Rapid prediction of conformationally-dependent DFT-level descriptors using graph neural networks for carboxylic acids and alkyl amines." Digital Discovery, 4 (1). en https://doi.org/10.1039/D4DD00284A Digital Discovery Creative Commons Attribution https://creativecommons.org/licenses/by/4.0/ application/pdf Royal Society of Chemistry Royal Society of Chemistry
spellingShingle Haas, Brittany C
Hardy, Melissa A
Sowndarya S. V., Shree
Adams, Keir
Coley, Connor W
Paton, Robert S
Sigman, Matthew S
Rapid prediction of conformationally-dependent DFT-level descriptors using graph neural networks for carboxylic acids and alkyl amines
title Rapid prediction of conformationally-dependent DFT-level descriptors using graph neural networks for carboxylic acids and alkyl amines
title_full Rapid prediction of conformationally-dependent DFT-level descriptors using graph neural networks for carboxylic acids and alkyl amines
title_fullStr Rapid prediction of conformationally-dependent DFT-level descriptors using graph neural networks for carboxylic acids and alkyl amines
title_full_unstemmed Rapid prediction of conformationally-dependent DFT-level descriptors using graph neural networks for carboxylic acids and alkyl amines
title_short Rapid prediction of conformationally-dependent DFT-level descriptors using graph neural networks for carboxylic acids and alkyl amines
title_sort rapid prediction of conformationally dependent dft level descriptors using graph neural networks for carboxylic acids and alkyl amines
url https://hdl.handle.net/1721.1/158096
work_keys_str_mv AT haasbrittanyc rapidpredictionofconformationallydependentdftleveldescriptorsusinggraphneuralnetworksforcarboxylicacidsandalkylamines
AT hardymelissaa rapidpredictionofconformationallydependentdftleveldescriptorsusinggraphneuralnetworksforcarboxylicacidsandalkylamines
AT sowndaryasvshree rapidpredictionofconformationallydependentdftleveldescriptorsusinggraphneuralnetworksforcarboxylicacidsandalkylamines
AT adamskeir rapidpredictionofconformationallydependentdftleveldescriptorsusinggraphneuralnetworksforcarboxylicacidsandalkylamines
AT coleyconnorw rapidpredictionofconformationallydependentdftleveldescriptorsusinggraphneuralnetworksforcarboxylicacidsandalkylamines
AT patonroberts rapidpredictionofconformationallydependentdftleveldescriptorsusinggraphneuralnetworksforcarboxylicacidsandalkylamines
AT sigmanmatthews rapidpredictionofconformationallydependentdftleveldescriptorsusinggraphneuralnetworksforcarboxylicacidsandalkylamines