Artificial intelligence driven design of catalysts and materials for ring opening polymerization using a domain-specific language

Abstract Advances in machine learning (ML) and automated experimentation are poised to vastly accelerate research in polymer science. Data representation is a critical aspect for enabling ML integration in research workflows, yet many data models impose significant rigidity making it difficult to ac...

Full description

Bibliographic Details
Main Authors: Nathaniel H. Park, Matteo Manica, Jannis Born, James L. Hedrick, Tim Erdmann, Dmitry Yu. Zubarev, Nil Adell-Mill, Pedro L. Arrechea
Format: Article
Language:English
Published: Nature Portfolio 2023-06-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-023-39396-3
_version_ 1827890332804579328
author Nathaniel H. Park
Matteo Manica
Jannis Born
James L. Hedrick
Tim Erdmann
Dmitry Yu. Zubarev
Nil Adell-Mill
Pedro L. Arrechea
author_facet Nathaniel H. Park
Matteo Manica
Jannis Born
James L. Hedrick
Tim Erdmann
Dmitry Yu. Zubarev
Nil Adell-Mill
Pedro L. Arrechea
author_sort Nathaniel H. Park
collection DOAJ
description Abstract Advances in machine learning (ML) and automated experimentation are poised to vastly accelerate research in polymer science. Data representation is a critical aspect for enabling ML integration in research workflows, yet many data models impose significant rigidity making it difficult to accommodate a broad array of experiment and data types found in polymer science. This inflexibility presents a significant barrier for researchers to leverage their historical data in ML development. Here we show that a domain specific language, termed Chemical Markdown Language (CMDL), provides flexible, extensible, and consistent representation of disparate experiment types and polymer structures. CMDL enables seamless use of historical experimental data to fine-tune regression transformer (RT) models for generative molecular design tasks. We demonstrate the utility of this approach through the generation and the experimental validation of catalysts and polymers in the context of ring-opening polymerization—although we provide examples of how CMDL can be more broadly applied to other polymer classes. Critically, we show how the CMDL tuned model preserves key functional groups within the polymer structure, allowing for experimental validation. These results reveal the versatility of CMDL and how it facilitates translation of historical data into meaningful predictive and generative models to produce experimentally actionable output.
first_indexed 2024-03-12T21:08:02Z
format Article
id doaj.art-701a0247664540b78e39c2b39912603c
institution Directory Open Access Journal
issn 2041-1723
language English
last_indexed 2024-03-12T21:08:02Z
publishDate 2023-06-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj.art-701a0247664540b78e39c2b39912603c2023-07-30T11:19:57ZengNature PortfolioNature Communications2041-17232023-06-0114111510.1038/s41467-023-39396-3Artificial intelligence driven design of catalysts and materials for ring opening polymerization using a domain-specific languageNathaniel H. Park0Matteo Manica1Jannis Born2James L. Hedrick3Tim Erdmann4Dmitry Yu. Zubarev5Nil Adell-Mill6Pedro L. Arrechea7IBM Research–AlmadenIBM Research–ZurichIBM Research–ZurichIBM Research–AlmadenIBM Research–AlmadenIBM Research–AlmadenIBM Research–ZurichIBM Research–AlmadenAbstract Advances in machine learning (ML) and automated experimentation are poised to vastly accelerate research in polymer science. Data representation is a critical aspect for enabling ML integration in research workflows, yet many data models impose significant rigidity making it difficult to accommodate a broad array of experiment and data types found in polymer science. This inflexibility presents a significant barrier for researchers to leverage their historical data in ML development. Here we show that a domain specific language, termed Chemical Markdown Language (CMDL), provides flexible, extensible, and consistent representation of disparate experiment types and polymer structures. CMDL enables seamless use of historical experimental data to fine-tune regression transformer (RT) models for generative molecular design tasks. We demonstrate the utility of this approach through the generation and the experimental validation of catalysts and polymers in the context of ring-opening polymerization—although we provide examples of how CMDL can be more broadly applied to other polymer classes. Critically, we show how the CMDL tuned model preserves key functional groups within the polymer structure, allowing for experimental validation. These results reveal the versatility of CMDL and how it facilitates translation of historical data into meaningful predictive and generative models to produce experimentally actionable output.https://doi.org/10.1038/s41467-023-39396-3
spellingShingle Nathaniel H. Park
Matteo Manica
Jannis Born
James L. Hedrick
Tim Erdmann
Dmitry Yu. Zubarev
Nil Adell-Mill
Pedro L. Arrechea
Artificial intelligence driven design of catalysts and materials for ring opening polymerization using a domain-specific language
Nature Communications
title Artificial intelligence driven design of catalysts and materials for ring opening polymerization using a domain-specific language
title_full Artificial intelligence driven design of catalysts and materials for ring opening polymerization using a domain-specific language
title_fullStr Artificial intelligence driven design of catalysts and materials for ring opening polymerization using a domain-specific language
title_full_unstemmed Artificial intelligence driven design of catalysts and materials for ring opening polymerization using a domain-specific language
title_short Artificial intelligence driven design of catalysts and materials for ring opening polymerization using a domain-specific language
title_sort artificial intelligence driven design of catalysts and materials for ring opening polymerization using a domain specific language
url https://doi.org/10.1038/s41467-023-39396-3
work_keys_str_mv AT nathanielhpark artificialintelligencedrivendesignofcatalystsandmaterialsforringopeningpolymerizationusingadomainspecificlanguage
AT matteomanica artificialintelligencedrivendesignofcatalystsandmaterialsforringopeningpolymerizationusingadomainspecificlanguage
AT jannisborn artificialintelligencedrivendesignofcatalystsandmaterialsforringopeningpolymerizationusingadomainspecificlanguage
AT jameslhedrick artificialintelligencedrivendesignofcatalystsandmaterialsforringopeningpolymerizationusingadomainspecificlanguage
AT timerdmann artificialintelligencedrivendesignofcatalystsandmaterialsforringopeningpolymerizationusingadomainspecificlanguage
AT dmitryyuzubarev artificialintelligencedrivendesignofcatalystsandmaterialsforringopeningpolymerizationusingadomainspecificlanguage
AT niladellmill artificialintelligencedrivendesignofcatalystsandmaterialsforringopeningpolymerizationusingadomainspecificlanguage
AT pedrolarrechea artificialintelligencedrivendesignofcatalystsandmaterialsforringopeningpolymerizationusingadomainspecificlanguage