Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides

Optimally doped single-phase compounds are necessary to advance state-of-the-art thermoelectric devices which convert heat into electricity and vice versa, requiring solid-state synthesis of bulk materials. For data-driven approaches to learn these recipes, it requires careful data curation from lar...

Full description

Bibliographic Details
Main Authors: Thway, Maung, Low, Andre Kai Yuan, Khetan, Samyak, Dai, Haiwen, Recatala-Gomez, Jose, Chen, Andy Paul, Hippalgaonkar, Kedar
Other Authors: School of Materials Science and Engineering
Format: Journal Article
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/174885
_version_ 1826113041338466304
author Thway, Maung
Low, Andre Kai Yuan
Khetan, Samyak
Dai, Haiwen
Recatala-Gomez, Jose
Chen, Andy Paul
Hippalgaonkar, Kedar
author2 School of Materials Science and Engineering
author_facet School of Materials Science and Engineering
Thway, Maung
Low, Andre Kai Yuan
Khetan, Samyak
Dai, Haiwen
Recatala-Gomez, Jose
Chen, Andy Paul
Hippalgaonkar, Kedar
author_sort Thway, Maung
collection NTU
description Optimally doped single-phase compounds are necessary to advance state-of-the-art thermoelectric devices which convert heat into electricity and vice versa, requiring solid-state synthesis of bulk materials. For data-driven approaches to learn these recipes, it requires careful data curation from large bodies of text which may not be available for some materials, as well as a refined language processing algorithm which presents a high barrier of entry. We propose applying Large Language Models (LLMs) to parse solid-state synthesis recipes, encapsulating all essential synthesis information intuitively in terms of primary and secondary heating peaks. Using a domain-expert curated dataset for a specific material (Gold Standard), we engineered a prompt set for GPT-3.5 to replicate the same dataset (Silver Standard), doing so successfully with 73% overall accuracy. We then proceed to extract and infer synthesis conditions for other ternary chalcogenides with the same prompt set. From a database of 168 research papers, we successfully parsed 61 papers which we then used to develop a classifier to predict phase purity. Our methodology demonstrates the generalizability of Large Language Models (LLMs) for text parsing, specifically for materials with sparse literature and unbalanced reporting (since usually only positive results are shown). Our work provides a roadmap for future endeavors seeking to amalgamate LLMs with materials science research, heralding a potentially transformative paradigm in the synthesis and characterization of novel materials.
first_indexed 2024-10-01T03:16:41Z
format Journal Article
id ntu-10356/174885
institution Nanyang Technological University
language English
last_indexed 2024-10-01T03:16:41Z
publishDate 2024
record_format dspace
spelling ntu-10356/1748852024-04-19T15:59:52Z Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides Thway, Maung Low, Andre Kai Yuan Khetan, Samyak Dai, Haiwen Recatala-Gomez, Jose Chen, Andy Paul Hippalgaonkar, Kedar School of Materials Science and Engineering Institute of Materials Research and Engineering, A*STAR Engineering GPT-3.5 Text parsing Optimally doped single-phase compounds are necessary to advance state-of-the-art thermoelectric devices which convert heat into electricity and vice versa, requiring solid-state synthesis of bulk materials. For data-driven approaches to learn these recipes, it requires careful data curation from large bodies of text which may not be available for some materials, as well as a refined language processing algorithm which presents a high barrier of entry. We propose applying Large Language Models (LLMs) to parse solid-state synthesis recipes, encapsulating all essential synthesis information intuitively in terms of primary and secondary heating peaks. Using a domain-expert curated dataset for a specific material (Gold Standard), we engineered a prompt set for GPT-3.5 to replicate the same dataset (Silver Standard), doing so successfully with 73% overall accuracy. We then proceed to extract and infer synthesis conditions for other ternary chalcogenides with the same prompt set. From a database of 168 research papers, we successfully parsed 61 papers which we then used to develop a classifier to predict phase purity. Our methodology demonstrates the generalizability of Large Language Models (LLMs) for text parsing, specifically for materials with sparse literature and unbalanced reporting (since usually only positive results are shown). Our work provides a roadmap for future endeavors seeking to amalgamate LLMs with materials science research, heralding a potentially transformative paradigm in the synthesis and characterization of novel materials. Agency for Science, Technology and Research (A*STAR) National Research Foundation (NRF) Published version The authors acknowledge funding from AME Programmatic Funds by the Agency for Science, Technology and Research under Grant (No. A1898b0043). KH also acknowledges funding from the NRF Fellowship (NRF-NRFF13-2021-0011). 2024-04-15T06:04:07Z 2024-04-15T06:04:07Z 2024 Journal Article Thway, M., Low, A. K. Y., Khetan, S., Dai, H., Recatala-Gomez, J., Chen, A. P. & Hippalgaonkar, K. (2024). Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides. Digital Discovery, 3(2), 328-336. https://dx.doi.org/10.1039/d3dd00202k 2635-098X https://hdl.handle.net/10356/174885 10.1039/d3dd00202k 2-s2.0-85182443481 2 3 328 336 en A1898b0043 NRF-NRFF13-2021-0011 Digital Discovery © 2024 The Author(s). Published by the Royal Society of Chemistry. This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. application/pdf
spellingShingle Engineering
GPT-3.5
Text parsing
Thway, Maung
Low, Andre Kai Yuan
Khetan, Samyak
Dai, Haiwen
Recatala-Gomez, Jose
Chen, Andy Paul
Hippalgaonkar, Kedar
Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides
title Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides
title_full Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides
title_fullStr Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides
title_full_unstemmed Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides
title_short Harnessing GPT-3.5 for text parsing in solid-state synthesis - case study of ternary chalcogenides
title_sort harnessing gpt 3 5 for text parsing in solid state synthesis case study of ternary chalcogenides
topic Engineering
GPT-3.5
Text parsing
url https://hdl.handle.net/10356/174885
work_keys_str_mv AT thwaymaung harnessinggpt35fortextparsinginsolidstatesynthesiscasestudyofternarychalcogenides
AT lowandrekaiyuan harnessinggpt35fortextparsinginsolidstatesynthesiscasestudyofternarychalcogenides
AT khetansamyak harnessinggpt35fortextparsinginsolidstatesynthesiscasestudyofternarychalcogenides
AT daihaiwen harnessinggpt35fortextparsinginsolidstatesynthesiscasestudyofternarychalcogenides
AT recatalagomezjose harnessinggpt35fortextparsinginsolidstatesynthesiscasestudyofternarychalcogenides
AT chenandypaul harnessinggpt35fortextparsinginsolidstatesynthesiscasestudyofternarychalcogenides
AT hippalgaonkarkedar harnessinggpt35fortextparsinginsolidstatesynthesiscasestudyofternarychalcogenides