High-Throughput Extraction of Phase–Property Relationships from Literature Using Natural Language Processing and Large Language Models

Consolidating published research on aluminum alloys into insights about microstructure–property relationships can simplify and reduce the costs involved in alloy design. One critical design consideration for many heat-treatable alloys deriving superior properties from precipitation are phases as key...

Full description

Bibliographic Details
Main Authors: Montanelli, Luca, Venugopal, Vineeth, Olivetti, Elsa A., Latypov, Marat I.
Format: Article
Language:English
Published: Springer International Publishing 2024
Subjects:
Online Access:https://hdl.handle.net/1721.1/153929
_version_ 1811095256251236352
author Montanelli, Luca
Venugopal, Vineeth
Olivetti, Elsa A.
Latypov, Marat I.
author_facet Montanelli, Luca
Venugopal, Vineeth
Olivetti, Elsa A.
Latypov, Marat I.
author_sort Montanelli, Luca
collection MIT
description Consolidating published research on aluminum alloys into insights about microstructure–property relationships can simplify and reduce the costs involved in alloy design. One critical design consideration for many heat-treatable alloys deriving superior properties from precipitation are phases as key microstructure constituents because they can have a decisive impact on the engineering properties of alloys. Here, we present a computational framework for high-throughput extraction of phases and their impact on properties from scientific papers. Our framework includes transformer-based and large language models to identify sentences with phase-property information in papers, recognize phase and property entities, and extract phase-property relationships and their “sentiment.” We demonstrate the application of our framework on aluminum alloys, for which we build a database of 7,675 phase–property relationships extracted from a corpus of almost 5000 full-text papers. We comment on the extracted relationships based on common metallurgical knowledge.
first_indexed 2024-09-23T16:14:06Z
format Article
id mit-1721.1/153929
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-24T16:23:50Z
publishDate 2024
publisher Springer International Publishing
record_format dspace
spelling mit-1721.1/1539292024-09-23T04:09:07Z High-Throughput Extraction of Phase–Property Relationships from Literature Using Natural Language Processing and Large Language Models Montanelli, Luca Venugopal, Vineeth Olivetti, Elsa A. Latypov, Marat I. Industrial and Manufacturing Engineering General Materials Science Consolidating published research on aluminum alloys into insights about microstructure–property relationships can simplify and reduce the costs involved in alloy design. One critical design consideration for many heat-treatable alloys deriving superior properties from precipitation are phases as key microstructure constituents because they can have a decisive impact on the engineering properties of alloys. Here, we present a computational framework for high-throughput extraction of phases and their impact on properties from scientific papers. Our framework includes transformer-based and large language models to identify sentences with phase-property information in papers, recognize phase and property entities, and extract phase-property relationships and their “sentiment.” We demonstrate the application of our framework on aluminum alloys, for which we build a database of 7,675 phase–property relationships extracted from a corpus of almost 5000 full-text papers. We comment on the extracted relationships based on common metallurgical knowledge. 2024-03-25T15:25:54Z 2024-03-25T15:25:54Z 2024-03-19 2024-03-24T04:18:05Z Article http://purl.org/eprint/type/JournalArticle 2193-9764 2193-9772 https://hdl.handle.net/1721.1/153929 Montanelli, L., Venugopal, V., Olivetti, E.A. et al. High-Throughput Extraction of Phase–Property Relationships from Literature Using Natural Language Processing and Large Language Models. Integr Mater Manuf Innov (2024). https://doi.org/10.1007/s40192-024-00344-8 PUBLISHER_CC en 10.1007/s40192-024-00344-8 Integrating Materials and Manufacturing Innovation Creative Commons Attribution https://creativecommons.org/licenses/by/4.0/ The Author(s) application/pdf Springer International Publishing Springer International Publishing
spellingShingle Industrial and Manufacturing Engineering
General Materials Science
Montanelli, Luca
Venugopal, Vineeth
Olivetti, Elsa A.
Latypov, Marat I.
High-Throughput Extraction of Phase–Property Relationships from Literature Using Natural Language Processing and Large Language Models
title High-Throughput Extraction of Phase–Property Relationships from Literature Using Natural Language Processing and Large Language Models
title_full High-Throughput Extraction of Phase–Property Relationships from Literature Using Natural Language Processing and Large Language Models
title_fullStr High-Throughput Extraction of Phase–Property Relationships from Literature Using Natural Language Processing and Large Language Models
title_full_unstemmed High-Throughput Extraction of Phase–Property Relationships from Literature Using Natural Language Processing and Large Language Models
title_short High-Throughput Extraction of Phase–Property Relationships from Literature Using Natural Language Processing and Large Language Models
title_sort high throughput extraction of phase property relationships from literature using natural language processing and large language models
topic Industrial and Manufacturing Engineering
General Materials Science
url https://hdl.handle.net/1721.1/153929
work_keys_str_mv AT montanelliluca highthroughputextractionofphasepropertyrelationshipsfromliteratureusingnaturallanguageprocessingandlargelanguagemodels
AT venugopalvineeth highthroughputextractionofphasepropertyrelationshipsfromliteratureusingnaturallanguageprocessingandlargelanguagemodels
AT olivettielsaa highthroughputextractionofphasepropertyrelationshipsfromliteratureusingnaturallanguageprocessingandlargelanguagemodels
AT latypovmarati highthroughputextractionofphasepropertyrelationshipsfromliteratureusingnaturallanguageprocessingandlargelanguagemodels