Automatic extraction of materials and properties from superconductors scientific literature
The automatic extraction of materials and related properties from the scientific literature is gaining attention in data-driven materials science (Materials Informatics). In this paper, we discuss Grobid-superconductors, our solution for automatically extracting superconductor material names and res...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Taylor & Francis Group
2023-12-01
|
Series: | Science and Technology of Advanced Materials: Methods |
Subjects: | |
Online Access: | http://dx.doi.org/10.1080/27660400.2022.2153633 |
_version_ | 1797685783956553728 |
---|---|
author | Luca Foppiano Pedro Baptista Castro Pedro Ortiz Suarez Kensei Terashima Yoshihiko Takano Masashi Ishii |
author_facet | Luca Foppiano Pedro Baptista Castro Pedro Ortiz Suarez Kensei Terashima Yoshihiko Takano Masashi Ishii |
author_sort | Luca Foppiano |
collection | DOAJ |
description | The automatic extraction of materials and related properties from the scientific literature is gaining attention in data-driven materials science (Materials Informatics). In this paper, we discuss Grobid-superconductors, our solution for automatically extracting superconductor material names and respective properties from text. Built as a Grobid module, it combines machine learning and heuristic approaches in a multi-step architecture that supports input data as raw text or PDF documents. Using Grobid-superconductors, we built SuperCon2, a database of 40,324 materials and properties records from 37,700 papers. The material (or sample) information is represented by name, chemical formula, and material class, and is characterized by shape, doping, substitution variables for components, and substrate as adjoined information. The properties include the Tc superconducting critical temperature and, when available, applied pressure with the Tc measurement method. |
first_indexed | 2024-03-12T00:56:23Z |
format | Article |
id | doaj.art-9278b040c9c94ee8b0b20933e033ec63 |
institution | Directory Open Access Journal |
issn | 2766-0400 |
language | English |
last_indexed | 2024-03-12T00:56:23Z |
publishDate | 2023-12-01 |
publisher | Taylor & Francis Group |
record_format | Article |
series | Science and Technology of Advanced Materials: Methods |
spelling | doaj.art-9278b040c9c94ee8b0b20933e033ec632023-09-14T13:24:39ZengTaylor & Francis GroupScience and Technology of Advanced Materials: Methods2766-04002023-12-013110.1080/27660400.2022.21536332153633Automatic extraction of materials and properties from superconductors scientific literatureLuca Foppiano0Pedro Baptista Castro1Pedro Ortiz Suarez2Kensei Terashima3Yoshihiko Takano4Masashi Ishii5MaDIS, NIMSMANA, NIMSUniversity of MannheimMANA, NIMSMANA, NIMSMaDIS, NIMSThe automatic extraction of materials and related properties from the scientific literature is gaining attention in data-driven materials science (Materials Informatics). In this paper, we discuss Grobid-superconductors, our solution for automatically extracting superconductor material names and respective properties from text. Built as a Grobid module, it combines machine learning and heuristic approaches in a multi-step architecture that supports input data as raw text or PDF documents. Using Grobid-superconductors, we built SuperCon2, a database of 40,324 materials and properties records from 37,700 papers. The material (or sample) information is represented by name, chemical formula, and material class, and is characterized by shape, doping, substitution variables for components, and substrate as adjoined information. The properties include the Tc superconducting critical temperature and, when available, applied pressure with the Tc measurement method.http://dx.doi.org/10.1080/27660400.2022.2153633materials informaticssuperconductorsmachine learningnlptdm |
spellingShingle | Luca Foppiano Pedro Baptista Castro Pedro Ortiz Suarez Kensei Terashima Yoshihiko Takano Masashi Ishii Automatic extraction of materials and properties from superconductors scientific literature Science and Technology of Advanced Materials: Methods materials informatics superconductors machine learning nlp tdm |
title | Automatic extraction of materials and properties from superconductors scientific literature |
title_full | Automatic extraction of materials and properties from superconductors scientific literature |
title_fullStr | Automatic extraction of materials and properties from superconductors scientific literature |
title_full_unstemmed | Automatic extraction of materials and properties from superconductors scientific literature |
title_short | Automatic extraction of materials and properties from superconductors scientific literature |
title_sort | automatic extraction of materials and properties from superconductors scientific literature |
topic | materials informatics superconductors machine learning nlp tdm |
url | http://dx.doi.org/10.1080/27660400.2022.2153633 |
work_keys_str_mv | AT lucafoppiano automaticextractionofmaterialsandpropertiesfromsuperconductorsscientificliterature AT pedrobaptistacastro automaticextractionofmaterialsandpropertiesfromsuperconductorsscientificliterature AT pedroortizsuarez automaticextractionofmaterialsandpropertiesfromsuperconductorsscientificliterature AT kenseiterashima automaticextractionofmaterialsandpropertiesfromsuperconductorsscientificliterature AT yoshihikotakano automaticextractionofmaterialsandpropertiesfromsuperconductorsscientificliterature AT masashiishii automaticextractionofmaterialsandpropertiesfromsuperconductorsscientificliterature |