polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics

Abstract Polymers are a vital part of everyday life. Their chemical universe is so large that it presents unprecedented opportunities as well as significant challenges to identify suitable application-specific candidates. We present a complete end-to-end machine-driven polymer informatics pipeline t...

Full description

Bibliographic Details
Main Authors: Christopher Kuenneth, Rampi Ramprasad
Format: Article
Language:English
Published: Nature Portfolio 2023-07-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-023-39868-6
_version_ 1827899810727854080
author Christopher Kuenneth
Rampi Ramprasad
author_facet Christopher Kuenneth
Rampi Ramprasad
author_sort Christopher Kuenneth
collection DOAJ
description Abstract Polymers are a vital part of everyday life. Their chemical universe is so large that it presents unprecedented opportunities as well as significant challenges to identify suitable application-specific candidates. We present a complete end-to-end machine-driven polymer informatics pipeline that can search this space for suitable candidates at unprecedented speed and accuracy. This pipeline includes a polymer chemical fingerprinting capability called polyBERT (inspired by Natural Language Processing concepts), and a multitask learning approach that maps the polyBERT fingerprints to a host of properties. polyBERT is a chemical linguist that treats the chemical structure of polymers as a chemical language. The present approach outstrips the best presently available concepts for polymer property prediction based on handcrafted fingerprint schemes in speed by two orders of magnitude while preserving accuracy, thus making it a strong candidate for deployment in scalable architectures including cloud infrastructures.
first_indexed 2024-03-12T23:22:40Z
format Article
id doaj.art-cf08bcaaa38b496ca1dd9f3c137c723a
institution Directory Open Access Journal
issn 2041-1723
language English
last_indexed 2024-03-12T23:22:40Z
publishDate 2023-07-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj.art-cf08bcaaa38b496ca1dd9f3c137c723a2023-07-16T11:22:01ZengNature PortfolioNature Communications2041-17232023-07-0114111110.1038/s41467-023-39868-6polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informaticsChristopher Kuenneth0Rampi Ramprasad1School of Materials Science and Engineering, Georgia Institute of TechnologySchool of Materials Science and Engineering, Georgia Institute of TechnologyAbstract Polymers are a vital part of everyday life. Their chemical universe is so large that it presents unprecedented opportunities as well as significant challenges to identify suitable application-specific candidates. We present a complete end-to-end machine-driven polymer informatics pipeline that can search this space for suitable candidates at unprecedented speed and accuracy. This pipeline includes a polymer chemical fingerprinting capability called polyBERT (inspired by Natural Language Processing concepts), and a multitask learning approach that maps the polyBERT fingerprints to a host of properties. polyBERT is a chemical linguist that treats the chemical structure of polymers as a chemical language. The present approach outstrips the best presently available concepts for polymer property prediction based on handcrafted fingerprint schemes in speed by two orders of magnitude while preserving accuracy, thus making it a strong candidate for deployment in scalable architectures including cloud infrastructures.https://doi.org/10.1038/s41467-023-39868-6
spellingShingle Christopher Kuenneth
Rampi Ramprasad
polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics
Nature Communications
title polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics
title_full polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics
title_fullStr polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics
title_full_unstemmed polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics
title_short polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics
title_sort polybert a chemical language model to enable fully machine driven ultrafast polymer informatics
url https://doi.org/10.1038/s41467-023-39868-6
work_keys_str_mv AT christopherkuenneth polybertachemicallanguagemodeltoenablefullymachinedrivenultrafastpolymerinformatics
AT rampiramprasad polybertachemicallanguagemodeltoenablefullymachinedrivenultrafastpolymerinformatics