Automatic extraction of materials and properties from superconductors scientific literature

The automatic extraction of materials and related properties from the scientific literature is gaining attention in data-driven materials science (Materials Informatics). In this paper, we discuss Grobid-superconductors, our solution for automatically extracting superconductor material names and res...

Full description

Bibliographic Details
Main Authors: Luca Foppiano, Pedro Baptista Castro, Pedro Ortiz Suarez, Kensei Terashima, Yoshihiko Takano, Masashi Ishii
Format: Article
Language:English
Published: Taylor & Francis Group 2023-12-01
Series:Science and Technology of Advanced Materials: Methods
Subjects:
Online Access:http://dx.doi.org/10.1080/27660400.2022.2153633
_version_ 1797685783956553728
author Luca Foppiano
Pedro Baptista Castro
Pedro Ortiz Suarez
Kensei Terashima
Yoshihiko Takano
Masashi Ishii
author_facet Luca Foppiano
Pedro Baptista Castro
Pedro Ortiz Suarez
Kensei Terashima
Yoshihiko Takano
Masashi Ishii
author_sort Luca Foppiano
collection DOAJ
description The automatic extraction of materials and related properties from the scientific literature is gaining attention in data-driven materials science (Materials Informatics). In this paper, we discuss Grobid-superconductors, our solution for automatically extracting superconductor material names and respective properties from text. Built as a Grobid module, it combines machine learning and heuristic approaches in a multi-step architecture that supports input data as raw text or PDF documents. Using Grobid-superconductors, we built SuperCon2, a database of 40,324 materials and properties records from 37,700 papers. The material (or sample) information is represented by name, chemical formula, and material class, and is characterized by shape, doping, substitution variables for components, and substrate as adjoined information. The properties include the Tc superconducting critical temperature and, when available, applied pressure with the Tc measurement method.
first_indexed 2024-03-12T00:56:23Z
format Article
id doaj.art-9278b040c9c94ee8b0b20933e033ec63
institution Directory Open Access Journal
issn 2766-0400
language English
last_indexed 2024-03-12T00:56:23Z
publishDate 2023-12-01
publisher Taylor & Francis Group
record_format Article
series Science and Technology of Advanced Materials: Methods
spelling doaj.art-9278b040c9c94ee8b0b20933e033ec632023-09-14T13:24:39ZengTaylor & Francis GroupScience and Technology of Advanced Materials: Methods2766-04002023-12-013110.1080/27660400.2022.21536332153633Automatic extraction of materials and properties from superconductors scientific literatureLuca Foppiano0Pedro Baptista Castro1Pedro Ortiz Suarez2Kensei Terashima3Yoshihiko Takano4Masashi Ishii5MaDIS, NIMSMANA, NIMSUniversity of MannheimMANA, NIMSMANA, NIMSMaDIS, NIMSThe automatic extraction of materials and related properties from the scientific literature is gaining attention in data-driven materials science (Materials Informatics). In this paper, we discuss Grobid-superconductors, our solution for automatically extracting superconductor material names and respective properties from text. Built as a Grobid module, it combines machine learning and heuristic approaches in a multi-step architecture that supports input data as raw text or PDF documents. Using Grobid-superconductors, we built SuperCon2, a database of 40,324 materials and properties records from 37,700 papers. The material (or sample) information is represented by name, chemical formula, and material class, and is characterized by shape, doping, substitution variables for components, and substrate as adjoined information. The properties include the Tc superconducting critical temperature and, when available, applied pressure with the Tc measurement method.http://dx.doi.org/10.1080/27660400.2022.2153633materials informaticssuperconductorsmachine learningnlptdm
spellingShingle Luca Foppiano
Pedro Baptista Castro
Pedro Ortiz Suarez
Kensei Terashima
Yoshihiko Takano
Masashi Ishii
Automatic extraction of materials and properties from superconductors scientific literature
Science and Technology of Advanced Materials: Methods
materials informatics
superconductors
machine learning
nlp
tdm
title Automatic extraction of materials and properties from superconductors scientific literature
title_full Automatic extraction of materials and properties from superconductors scientific literature
title_fullStr Automatic extraction of materials and properties from superconductors scientific literature
title_full_unstemmed Automatic extraction of materials and properties from superconductors scientific literature
title_short Automatic extraction of materials and properties from superconductors scientific literature
title_sort automatic extraction of materials and properties from superconductors scientific literature
topic materials informatics
superconductors
machine learning
nlp
tdm
url http://dx.doi.org/10.1080/27660400.2022.2153633
work_keys_str_mv AT lucafoppiano automaticextractionofmaterialsandpropertiesfromsuperconductorsscientificliterature
AT pedrobaptistacastro automaticextractionofmaterialsandpropertiesfromsuperconductorsscientificliterature
AT pedroortizsuarez automaticextractionofmaterialsandpropertiesfromsuperconductorsscientificliterature
AT kenseiterashima automaticextractionofmaterialsandpropertiesfromsuperconductorsscientificliterature
AT yoshihikotakano automaticextractionofmaterialsandpropertiesfromsuperconductorsscientificliterature
AT masashiishii automaticextractionofmaterialsandpropertiesfromsuperconductorsscientificliterature