Polygrammar: Grammar for Digital Polymer Representation and Generation

Abstract Polymers are widely studied materials with diverse properties and applications determined by molecular structures. It is essential to represent these structures clearly and explore the full space of achievable chemical designs. However, existing approaches cannot offer comprehensive design...

Full description

Bibliographic Details
Main Authors: Minghao Guo, Wan Shou, Liane Makatura, Timothy Erps, Michael Foshey, Wojciech Matusik
Format: Article
Language:English
Published: Wiley 2022-08-01
Series:Advanced Science
Subjects:
Online Access:https://doi.org/10.1002/advs.202101864
_version_ 1797819777248395264
author Minghao Guo
Wan Shou
Liane Makatura
Timothy Erps
Michael Foshey
Wojciech Matusik
author_facet Minghao Guo
Wan Shou
Liane Makatura
Timothy Erps
Michael Foshey
Wojciech Matusik
author_sort Minghao Guo
collection DOAJ
description Abstract Polymers are widely studied materials with diverse properties and applications determined by molecular structures. It is essential to represent these structures clearly and explore the full space of achievable chemical designs. However, existing approaches cannot offer comprehensive design models for polymers because of their inherent scale and structural complexity. Here, a parametric, context‐sensitive grammar designed specifically for polymers (PolyGrammar) is proposed. Using the symbolic hypergraph representation and 14 simple production rules, PolyGrammar can represent and generate all valid polyurethane structures. An algorithm is presented to translate any polyurethane structure from the popular Simplified Molecular‐Input Line‐entry System (SMILES) string format into the PolyGrammar representation. The representative power of PolyGrammar is tested by translating a dataset of over 600 polyurethane samples collected from the literature. Furthermore, it is shown that PolyGrammar can be easily extended to other copolymers and homopolymers. By offering a complete, explicit representation scheme and an explainable generative model with validity guarantees, PolyGrammar takes an essential step toward a more comprehensive and practical system for polymer discovery and exploration. As the first bridge between formal languages and chemistry, PolyGrammar also serves as a critical blueprint to inform the design of similar grammars for other chemistries, including organic and inorganic molecules.
first_indexed 2024-03-13T09:27:36Z
format Article
id doaj.art-d3852cf8a0404b469829c2146e2045c5
institution Directory Open Access Journal
issn 2198-3844
language English
last_indexed 2024-03-13T09:27:36Z
publishDate 2022-08-01
publisher Wiley
record_format Article
series Advanced Science
spelling doaj.art-d3852cf8a0404b469829c2146e2045c52023-05-26T08:56:00ZengWileyAdvanced Science2198-38442022-08-01923n/an/a10.1002/advs.202101864Polygrammar: Grammar for Digital Polymer Representation and GenerationMinghao Guo0Wan Shou1Liane Makatura2Timothy Erps3Michael Foshey4Wojciech Matusik5Computer Science and Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge MA 02139 USAComputer Science and Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge MA 02139 USAComputer Science and Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge MA 02139 USAComputer Science and Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge MA 02139 USAComputer Science and Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge MA 02139 USAComputer Science and Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge MA 02139 USAAbstract Polymers are widely studied materials with diverse properties and applications determined by molecular structures. It is essential to represent these structures clearly and explore the full space of achievable chemical designs. However, existing approaches cannot offer comprehensive design models for polymers because of their inherent scale and structural complexity. Here, a parametric, context‐sensitive grammar designed specifically for polymers (PolyGrammar) is proposed. Using the symbolic hypergraph representation and 14 simple production rules, PolyGrammar can represent and generate all valid polyurethane structures. An algorithm is presented to translate any polyurethane structure from the popular Simplified Molecular‐Input Line‐entry System (SMILES) string format into the PolyGrammar representation. The representative power of PolyGrammar is tested by translating a dataset of over 600 polyurethane samples collected from the literature. Furthermore, it is shown that PolyGrammar can be easily extended to other copolymers and homopolymers. By offering a complete, explicit representation scheme and an explainable generative model with validity guarantees, PolyGrammar takes an essential step toward a more comprehensive and practical system for polymer discovery and exploration. As the first bridge between formal languages and chemistry, PolyGrammar also serves as a critical blueprint to inform the design of similar grammars for other chemistries, including organic and inorganic molecules.https://doi.org/10.1002/advs.202101864context‐sensitive grammargenerative modelpolymer representation
spellingShingle Minghao Guo
Wan Shou
Liane Makatura
Timothy Erps
Michael Foshey
Wojciech Matusik
Polygrammar: Grammar for Digital Polymer Representation and Generation
Advanced Science
context‐sensitive grammar
generative model
polymer representation
title Polygrammar: Grammar for Digital Polymer Representation and Generation
title_full Polygrammar: Grammar for Digital Polymer Representation and Generation
title_fullStr Polygrammar: Grammar for Digital Polymer Representation and Generation
title_full_unstemmed Polygrammar: Grammar for Digital Polymer Representation and Generation
title_short Polygrammar: Grammar for Digital Polymer Representation and Generation
title_sort polygrammar grammar for digital polymer representation and generation
topic context‐sensitive grammar
generative model
polymer representation
url https://doi.org/10.1002/advs.202101864
work_keys_str_mv AT minghaoguo polygrammargrammarfordigitalpolymerrepresentationandgeneration
AT wanshou polygrammargrammarfordigitalpolymerrepresentationandgeneration
AT lianemakatura polygrammargrammarfordigitalpolymerrepresentationandgeneration
AT timothyerps polygrammargrammarfordigitalpolymerrepresentationandgeneration
AT michaelfoshey polygrammargrammarfordigitalpolymerrepresentationandgeneration
AT wojciechmatusik polygrammargrammarfordigitalpolymerrepresentationandgeneration