Automated Construction of a Photocatalysis Dataset for Water-Splitting Applications

Abstract We present an automatically generated dataset of 15,755 records that were extracted from 47,357 papers. These records contain water-splitting activity in the presence of certain photocatalysts, along with additional information about the chemical reaction conditions under which this activit...

Full description

Bibliographic Details
Main Authors: Taketomo Isazawa, Jacqueline M. Cole
Format: Article
Language:English
Published: Nature Portfolio 2023-09-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-023-02511-6
_version_ 1797578189487210496
author Taketomo Isazawa
Jacqueline M. Cole
author_facet Taketomo Isazawa
Jacqueline M. Cole
author_sort Taketomo Isazawa
collection DOAJ
description Abstract We present an automatically generated dataset of 15,755 records that were extracted from 47,357 papers. These records contain water-splitting activity in the presence of certain photocatalysts, along with additional information about the chemical reaction conditions under which this activity was recorded. These conditions include any co-catalysts and additives that were present during water splitting, the length of time for which the photocatalytic experiment was conducted, and the type of light source used, including its wavelength. Despite the text extraction of such a wide range of chemical reaction attributes, the dataset afforded good precision (71.2%) and recall (36.3%). These figures-of-merit were calculated based on a random sample of open-access papers from the corpus. Mining such a complex set of attributes required the development of novel techniques in knowledge extraction and interdependency resolution, leveraging inter- and intra-sentence relations, which are also described in this paper. We present a new version (version 2.2) of the chemistry-aware text-mining toolkit ChemDataExtractor, in which these new techniques are included.
first_indexed 2024-03-10T22:19:32Z
format Article
id doaj.art-9b542ee98bfa4f82a1eee274e6e89f36
institution Directory Open Access Journal
issn 2052-4463
language English
last_indexed 2024-03-10T22:19:32Z
publishDate 2023-09-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj.art-9b542ee98bfa4f82a1eee274e6e89f362023-11-19T12:20:24ZengNature PortfolioScientific Data2052-44632023-09-0110111110.1038/s41597-023-02511-6Automated Construction of a Photocatalysis Dataset for Water-Splitting ApplicationsTaketomo Isazawa0Jacqueline M. Cole1Cavendish Laboratory, Department of Physics, University of CambridgeCavendish Laboratory, Department of Physics, University of CambridgeAbstract We present an automatically generated dataset of 15,755 records that were extracted from 47,357 papers. These records contain water-splitting activity in the presence of certain photocatalysts, along with additional information about the chemical reaction conditions under which this activity was recorded. These conditions include any co-catalysts and additives that were present during water splitting, the length of time for which the photocatalytic experiment was conducted, and the type of light source used, including its wavelength. Despite the text extraction of such a wide range of chemical reaction attributes, the dataset afforded good precision (71.2%) and recall (36.3%). These figures-of-merit were calculated based on a random sample of open-access papers from the corpus. Mining such a complex set of attributes required the development of novel techniques in knowledge extraction and interdependency resolution, leveraging inter- and intra-sentence relations, which are also described in this paper. We present a new version (version 2.2) of the chemistry-aware text-mining toolkit ChemDataExtractor, in which these new techniques are included.https://doi.org/10.1038/s41597-023-02511-6
spellingShingle Taketomo Isazawa
Jacqueline M. Cole
Automated Construction of a Photocatalysis Dataset for Water-Splitting Applications
Scientific Data
title Automated Construction of a Photocatalysis Dataset for Water-Splitting Applications
title_full Automated Construction of a Photocatalysis Dataset for Water-Splitting Applications
title_fullStr Automated Construction of a Photocatalysis Dataset for Water-Splitting Applications
title_full_unstemmed Automated Construction of a Photocatalysis Dataset for Water-Splitting Applications
title_short Automated Construction of a Photocatalysis Dataset for Water-Splitting Applications
title_sort automated construction of a photocatalysis dataset for water splitting applications
url https://doi.org/10.1038/s41597-023-02511-6
work_keys_str_mv AT taketomoisazawa automatedconstructionofaphotocatalysisdatasetforwatersplittingapplications
AT jacquelinemcole automatedconstructionofaphotocatalysisdatasetforwatersplittingapplications