COMPAS-2: a dataset of cata-condensed hetero-polycyclic aromatic systems

Abstract Polycyclic aromatic systems are highly important to numerous applications, in particular to organic electronics and optoelectronics. High-throughput screening and generative models that can help to identify new molecules to advance these technologies require large amounts of high-quality da...

Full description

Bibliographic Details
Main Authors: Eduardo Mayo Yanes, Sabyasachi Chakraborty, Renana Gershoni-Poranne
Format: Article
Language:English
Published: Nature Portfolio 2024-01-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-024-02927-8
_version_ 1797350145028784128
author Eduardo Mayo Yanes
Sabyasachi Chakraborty
Renana Gershoni-Poranne
author_facet Eduardo Mayo Yanes
Sabyasachi Chakraborty
Renana Gershoni-Poranne
author_sort Eduardo Mayo Yanes
collection DOAJ
description Abstract Polycyclic aromatic systems are highly important to numerous applications, in particular to organic electronics and optoelectronics. High-throughput screening and generative models that can help to identify new molecules to advance these technologies require large amounts of high-quality data, which is expensive to generate. In this report, we present the largest freely available dataset of geometries and properties of cata-condensed poly(hetero)cyclic aromatic molecules calculated to date. Our dataset contains ~500k molecules comprising 11 types of aromatic and antiaromatic building blocks calculated at the GFN1-xTB level and is representative of a highly diverse chemical space. We detail the structure enumeration process and the methods used to provide various electronic properties (including HOMO-LUMO gap, adiabatic ionization potential, and adiabatic electron affinity). Additionally, we benchmark against a ~50k dataset calculated at the CAM-B3LYP-D3BJ/def2-SVP level and develop a fitting scheme to correct the xTB values to higher accuracy. These new datasets represent the second installment in the COMputational database of Polycyclic Aromatic Systems (COMPAS) Project.
first_indexed 2024-03-08T12:40:33Z
format Article
id doaj.art-96b24bb647814755b7aa18e7d4d7a061
institution Directory Open Access Journal
issn 2052-4463
language English
last_indexed 2024-03-08T12:40:33Z
publishDate 2024-01-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj.art-96b24bb647814755b7aa18e7d4d7a0612024-01-21T12:10:16ZengNature PortfolioScientific Data2052-44632024-01-0111111110.1038/s41597-024-02927-8COMPAS-2: a dataset of cata-condensed hetero-polycyclic aromatic systemsEduardo Mayo Yanes0Sabyasachi Chakraborty1Renana Gershoni-Poranne2Schulich Faculty of Chemistry, Technion - Israel Institute of TechnologySchulich Faculty of Chemistry, Technion - Israel Institute of TechnologySchulich Faculty of Chemistry, Technion - Israel Institute of TechnologyAbstract Polycyclic aromatic systems are highly important to numerous applications, in particular to organic electronics and optoelectronics. High-throughput screening and generative models that can help to identify new molecules to advance these technologies require large amounts of high-quality data, which is expensive to generate. In this report, we present the largest freely available dataset of geometries and properties of cata-condensed poly(hetero)cyclic aromatic molecules calculated to date. Our dataset contains ~500k molecules comprising 11 types of aromatic and antiaromatic building blocks calculated at the GFN1-xTB level and is representative of a highly diverse chemical space. We detail the structure enumeration process and the methods used to provide various electronic properties (including HOMO-LUMO gap, adiabatic ionization potential, and adiabatic electron affinity). Additionally, we benchmark against a ~50k dataset calculated at the CAM-B3LYP-D3BJ/def2-SVP level and develop a fitting scheme to correct the xTB values to higher accuracy. These new datasets represent the second installment in the COMputational database of Polycyclic Aromatic Systems (COMPAS) Project.https://doi.org/10.1038/s41597-024-02927-8
spellingShingle Eduardo Mayo Yanes
Sabyasachi Chakraborty
Renana Gershoni-Poranne
COMPAS-2: a dataset of cata-condensed hetero-polycyclic aromatic systems
Scientific Data
title COMPAS-2: a dataset of cata-condensed hetero-polycyclic aromatic systems
title_full COMPAS-2: a dataset of cata-condensed hetero-polycyclic aromatic systems
title_fullStr COMPAS-2: a dataset of cata-condensed hetero-polycyclic aromatic systems
title_full_unstemmed COMPAS-2: a dataset of cata-condensed hetero-polycyclic aromatic systems
title_short COMPAS-2: a dataset of cata-condensed hetero-polycyclic aromatic systems
title_sort compas 2 a dataset of cata condensed hetero polycyclic aromatic systems
url https://doi.org/10.1038/s41597-024-02927-8
work_keys_str_mv AT eduardomayoyanes compas2adatasetofcatacondensedheteropolycyclicaromaticsystems
AT sabyasachichakraborty compas2adatasetofcatacondensedheteropolycyclicaromaticsystems
AT renanagershoniporanne compas2adatasetofcatacondensedheteropolycyclicaromaticsystems