COMPAS-2: a dataset of cata-condensed hetero-polycyclic aromatic systems
Abstract Polycyclic aromatic systems are highly important to numerous applications, in particular to organic electronics and optoelectronics. High-throughput screening and generative models that can help to identify new molecules to advance these technologies require large amounts of high-quality da...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2024-01-01
|
Series: | Scientific Data |
Online Access: | https://doi.org/10.1038/s41597-024-02927-8 |
_version_ | 1797350145028784128 |
---|---|
author | Eduardo Mayo Yanes Sabyasachi Chakraborty Renana Gershoni-Poranne |
author_facet | Eduardo Mayo Yanes Sabyasachi Chakraborty Renana Gershoni-Poranne |
author_sort | Eduardo Mayo Yanes |
collection | DOAJ |
description | Abstract Polycyclic aromatic systems are highly important to numerous applications, in particular to organic electronics and optoelectronics. High-throughput screening and generative models that can help to identify new molecules to advance these technologies require large amounts of high-quality data, which is expensive to generate. In this report, we present the largest freely available dataset of geometries and properties of cata-condensed poly(hetero)cyclic aromatic molecules calculated to date. Our dataset contains ~500k molecules comprising 11 types of aromatic and antiaromatic building blocks calculated at the GFN1-xTB level and is representative of a highly diverse chemical space. We detail the structure enumeration process and the methods used to provide various electronic properties (including HOMO-LUMO gap, adiabatic ionization potential, and adiabatic electron affinity). Additionally, we benchmark against a ~50k dataset calculated at the CAM-B3LYP-D3BJ/def2-SVP level and develop a fitting scheme to correct the xTB values to higher accuracy. These new datasets represent the second installment in the COMputational database of Polycyclic Aromatic Systems (COMPAS) Project. |
first_indexed | 2024-03-08T12:40:33Z |
format | Article |
id | doaj.art-96b24bb647814755b7aa18e7d4d7a061 |
institution | Directory Open Access Journal |
issn | 2052-4463 |
language | English |
last_indexed | 2024-03-08T12:40:33Z |
publishDate | 2024-01-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Data |
spelling | doaj.art-96b24bb647814755b7aa18e7d4d7a0612024-01-21T12:10:16ZengNature PortfolioScientific Data2052-44632024-01-0111111110.1038/s41597-024-02927-8COMPAS-2: a dataset of cata-condensed hetero-polycyclic aromatic systemsEduardo Mayo Yanes0Sabyasachi Chakraborty1Renana Gershoni-Poranne2Schulich Faculty of Chemistry, Technion - Israel Institute of TechnologySchulich Faculty of Chemistry, Technion - Israel Institute of TechnologySchulich Faculty of Chemistry, Technion - Israel Institute of TechnologyAbstract Polycyclic aromatic systems are highly important to numerous applications, in particular to organic electronics and optoelectronics. High-throughput screening and generative models that can help to identify new molecules to advance these technologies require large amounts of high-quality data, which is expensive to generate. In this report, we present the largest freely available dataset of geometries and properties of cata-condensed poly(hetero)cyclic aromatic molecules calculated to date. Our dataset contains ~500k molecules comprising 11 types of aromatic and antiaromatic building blocks calculated at the GFN1-xTB level and is representative of a highly diverse chemical space. We detail the structure enumeration process and the methods used to provide various electronic properties (including HOMO-LUMO gap, adiabatic ionization potential, and adiabatic electron affinity). Additionally, we benchmark against a ~50k dataset calculated at the CAM-B3LYP-D3BJ/def2-SVP level and develop a fitting scheme to correct the xTB values to higher accuracy. These new datasets represent the second installment in the COMputational database of Polycyclic Aromatic Systems (COMPAS) Project.https://doi.org/10.1038/s41597-024-02927-8 |
spellingShingle | Eduardo Mayo Yanes Sabyasachi Chakraborty Renana Gershoni-Poranne COMPAS-2: a dataset of cata-condensed hetero-polycyclic aromatic systems Scientific Data |
title | COMPAS-2: a dataset of cata-condensed hetero-polycyclic aromatic systems |
title_full | COMPAS-2: a dataset of cata-condensed hetero-polycyclic aromatic systems |
title_fullStr | COMPAS-2: a dataset of cata-condensed hetero-polycyclic aromatic systems |
title_full_unstemmed | COMPAS-2: a dataset of cata-condensed hetero-polycyclic aromatic systems |
title_short | COMPAS-2: a dataset of cata-condensed hetero-polycyclic aromatic systems |
title_sort | compas 2 a dataset of cata condensed hetero polycyclic aromatic systems |
url | https://doi.org/10.1038/s41597-024-02927-8 |
work_keys_str_mv | AT eduardomayoyanes compas2adatasetofcatacondensedheteropolycyclicaromaticsystems AT sabyasachichakraborty compas2adatasetofcatacondensedheteropolycyclicaromaticsystems AT renanagershoniporanne compas2adatasetofcatacondensedheteropolycyclicaromaticsystems |