A database of thermally activated delayed fluorescent molecules auto-generated from scientific literature with ChemDataExtractor

Abstract A database of thermally activated delayed fluorescent (TADF) molecules was automatically generated from the scientific literature. It consists of 25,482 data records with an overall precision of 82%. Among these, 5,349 records have chemical names in the form of SMILES strings which are repr...

Full description

Bibliographic Details
Main Authors: Dingyun Huang, Jacqueline M. Cole
Format: Article
Language:English
Published: Nature Portfolio 2024-01-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-023-02897-3
Description
Summary:Abstract A database of thermally activated delayed fluorescent (TADF) molecules was automatically generated from the scientific literature. It consists of 25,482 data records with an overall precision of 82%. Among these, 5,349 records have chemical names in the form of SMILES strings which are represented with 91% accuracy; these are grouped in a subsidiary database. Each data record contains one of the following four properties: maximum emission wavelength (λ EM), photoluminescence quantum yield (PLQY), singlet-triplet energy splitting (ΔE ST), and delayed lifetime (τ D). The databases were created through text mining using ChemDataExtractor, a chemistry-aware natural-language-processing toolkit, which has been adapted for TADF research. The text-mined corpus consisted of 2,733 papers from the Royal Society of Chemistry and Elsevier. To the best of our knowledge, these databases are the first databases that have been auto-generated for TADF molecules from existing publications. The databases have been publicly released for experimental and computational applications in the TADF research field.
ISSN:2052-4463