MOFSimplify, machine learning models with extracted stability data of three thousand metal–organic frameworks

<jats:title>Abstract</jats:title><jats:p>We report a workflow and the output of a natural language processing (NLP)-based procedure to mine the extant metal–organic framework (MOF) literature describing structurally characterized MOFs and their solvent removal and thermal stabiliti...

Full description

Bibliographic Details
Main Authors: Nandy, Aditya, Terrones, Gianmarco, Arunachalam, Naveen, Duan, Chenru, Kastner, David W, Kulik, Heather J
Format: Article
Language:English
Published: Springer Science and Business Media LLC 2022
Online Access:https://hdl.handle.net/1721.1/141730
_version_ 1826192792338038784
author Nandy, Aditya
Terrones, Gianmarco
Arunachalam, Naveen
Duan, Chenru
Kastner, David W
Kulik, Heather J
author_facet Nandy, Aditya
Terrones, Gianmarco
Arunachalam, Naveen
Duan, Chenru
Kastner, David W
Kulik, Heather J
author_sort Nandy, Aditya
collection MIT
description <jats:title>Abstract</jats:title><jats:p>We report a workflow and the output of a natural language processing (NLP)-based procedure to mine the extant metal–organic framework (MOF) literature describing structurally characterized MOFs and their solvent removal and thermal stabilities. We obtain over 2,000 solvent removal stability measures from text mining and 3,000 thermal decomposition temperatures from thermogravimetric analysis data. We assess the validity of our NLP methods and the accuracy of our extracted data by comparing to a hand-labeled subset. Machine learning (ML, i.e. artificial neural network) models trained on this data using graph- and pore-geometry-based representations enable prediction of stability on new MOFs with quantified uncertainty. Our web interface, MOFSimplify, provides users access to our curated data and enables them to harness that data for predictions on new MOFs. MOFSimplify also encourages community feedback on existing data and on ML model predictions for community-based active learning for improved MOF stability models.</jats:p>
first_indexed 2024-09-23T09:29:16Z
format Article
id mit-1721.1/141730
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T09:29:16Z
publishDate 2022
publisher Springer Science and Business Media LLC
record_format dspace
spelling mit-1721.1/1417302022-04-08T03:35:27Z MOFSimplify, machine learning models with extracted stability data of three thousand metal–organic frameworks Nandy, Aditya Terrones, Gianmarco Arunachalam, Naveen Duan, Chenru Kastner, David W Kulik, Heather J <jats:title>Abstract</jats:title><jats:p>We report a workflow and the output of a natural language processing (NLP)-based procedure to mine the extant metal–organic framework (MOF) literature describing structurally characterized MOFs and their solvent removal and thermal stabilities. We obtain over 2,000 solvent removal stability measures from text mining and 3,000 thermal decomposition temperatures from thermogravimetric analysis data. We assess the validity of our NLP methods and the accuracy of our extracted data by comparing to a hand-labeled subset. Machine learning (ML, i.e. artificial neural network) models trained on this data using graph- and pore-geometry-based representations enable prediction of stability on new MOFs with quantified uncertainty. Our web interface, MOFSimplify, provides users access to our curated data and enables them to harness that data for predictions on new MOFs. MOFSimplify also encourages community feedback on existing data and on ML model predictions for community-based active learning for improved MOF stability models.</jats:p> 2022-04-07T13:00:55Z 2022-04-07T13:00:55Z 2022-12 2022-04-07T12:53:38Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/141730 Nandy, Aditya, Terrones, Gianmarco, Arunachalam, Naveen, Duan, Chenru, Kastner, David W et al. 2022. "MOFSimplify, machine learning models with extracted stability data of three thousand metal–organic frameworks." Scientific Data, 9 (1). en 10.1038/s41597-022-01181-0 Scientific Data Creative Commons Attribution 4.0 International license https://creativecommons.org/licenses/by/4.0/ application/pdf Springer Science and Business Media LLC Scientific Data
spellingShingle Nandy, Aditya
Terrones, Gianmarco
Arunachalam, Naveen
Duan, Chenru
Kastner, David W
Kulik, Heather J
MOFSimplify, machine learning models with extracted stability data of three thousand metal–organic frameworks
title MOFSimplify, machine learning models with extracted stability data of three thousand metal–organic frameworks
title_full MOFSimplify, machine learning models with extracted stability data of three thousand metal–organic frameworks
title_fullStr MOFSimplify, machine learning models with extracted stability data of three thousand metal–organic frameworks
title_full_unstemmed MOFSimplify, machine learning models with extracted stability data of three thousand metal–organic frameworks
title_short MOFSimplify, machine learning models with extracted stability data of three thousand metal–organic frameworks
title_sort mofsimplify machine learning models with extracted stability data of three thousand metal organic frameworks
url https://hdl.handle.net/1721.1/141730
work_keys_str_mv AT nandyaditya mofsimplifymachinelearningmodelswithextractedstabilitydataofthreethousandmetalorganicframeworks
AT terronesgianmarco mofsimplifymachinelearningmodelswithextractedstabilitydataofthreethousandmetalorganicframeworks
AT arunachalamnaveen mofsimplifymachinelearningmodelswithextractedstabilitydataofthreethousandmetalorganicframeworks
AT duanchenru mofsimplifymachinelearningmodelswithextractedstabilitydataofthreethousandmetalorganicframeworks
AT kastnerdavidw mofsimplifymachinelearningmodelswithextractedstabilitydataofthreethousandmetalorganicframeworks
AT kulikheatherj mofsimplifymachinelearningmodelswithextractedstabilitydataofthreethousandmetalorganicframeworks