Machine learning descriptors in materials chemistry used in multiple experimentally validated studies: Oliynyk elemental property dataset
Materials informatics employs data-driven approaches for analysis and discovery of materials. Features also referred to as descriptors are essential in generating reliable and accurate machine-learning models. While general data can be obtained through public and commercial sources, features must be...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2024-04-01
|
Series: | Data in Brief |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2352340924001495 |
_version_ | 1797256378142687232 |
---|---|
author | Sangjoon Lee Clio Chen Griheydi Garcia Anton Oliynyk |
author_facet | Sangjoon Lee Clio Chen Griheydi Garcia Anton Oliynyk |
author_sort | Sangjoon Lee |
collection | DOAJ |
description | Materials informatics employs data-driven approaches for analysis and discovery of materials. Features also referred to as descriptors are essential in generating reliable and accurate machine-learning models. While general data can be obtained through public and commercial sources, features must be tailored to specific applications. Common featurizers suitable for generic chemical problems may not be effective in features-property mapping in solid-state materials with ML models. Here, we have assembled the Oliynyk property list for compositional feature generation, which performs well on limited datasets (50 to 1000 training data points) in the solid-state materials domain. The dataset contains 98 elemental features for atomic numbers from 1 to 92, including thermodynamic properties, electronic structure data, size, electronegativity, and bulk properties such as melting point, density, and conductivity. The dataset has been utilized peer-reviewed publications in predicting material hardness, classification, discovery of novel Heusler compounds, band gap prediction, and determining the site preference of atoms using machine learning models including support vector machines, random forests for classification, and support vector regression for regression problems. We have compiled the dataset by parsing data from publicly available databases and literature and further supplementing it by interpolating values with Gaussian process regression. |
first_indexed | 2024-03-08T00:13:33Z |
format | Article |
id | doaj.art-e47a31200e6046a983f0ba322d69713e |
institution | Directory Open Access Journal |
issn | 2352-3409 |
language | English |
last_indexed | 2024-04-24T22:20:47Z |
publishDate | 2024-04-01 |
publisher | Elsevier |
record_format | Article |
series | Data in Brief |
spelling | doaj.art-e47a31200e6046a983f0ba322d69713e2024-03-20T06:10:01ZengElsevierData in Brief2352-34092024-04-0153110178Machine learning descriptors in materials chemistry used in multiple experimentally validated studies: Oliynyk elemental property datasetSangjoon Lee0Clio Chen1Griheydi Garcia2Anton Oliynyk3Department of Applied Physics and Applied Mathematics, Columbia University, New York, NY 10027, United States; Corresponding authors.Department of Chemistry and Biochemistry, Manhattan College, Riverdale, NY 10471, United StatesDepartment of Chemistry and Biochemistry, Manhattan College, Riverdale, NY 10471, United StatesDepartment of Chemistry, Hunter College, City University of New York, NY 10065, United States; Corresponding authors.Materials informatics employs data-driven approaches for analysis and discovery of materials. Features also referred to as descriptors are essential in generating reliable and accurate machine-learning models. While general data can be obtained through public and commercial sources, features must be tailored to specific applications. Common featurizers suitable for generic chemical problems may not be effective in features-property mapping in solid-state materials with ML models. Here, we have assembled the Oliynyk property list for compositional feature generation, which performs well on limited datasets (50 to 1000 training data points) in the solid-state materials domain. The dataset contains 98 elemental features for atomic numbers from 1 to 92, including thermodynamic properties, electronic structure data, size, electronegativity, and bulk properties such as melting point, density, and conductivity. The dataset has been utilized peer-reviewed publications in predicting material hardness, classification, discovery of novel Heusler compounds, band gap prediction, and determining the site preference of atoms using machine learning models including support vector machines, random forests for classification, and support vector regression for regression problems. We have compiled the dataset by parsing data from publicly available databases and literature and further supplementing it by interpolating values with Gaussian process regression.http://www.sciencedirect.com/science/article/pii/S2352340924001495Materials informaticsMachine learningFeature engineeringMaterials chemistry |
spellingShingle | Sangjoon Lee Clio Chen Griheydi Garcia Anton Oliynyk Machine learning descriptors in materials chemistry used in multiple experimentally validated studies: Oliynyk elemental property dataset Data in Brief Materials informatics Machine learning Feature engineering Materials chemistry |
title | Machine learning descriptors in materials chemistry used in multiple experimentally validated studies: Oliynyk elemental property dataset |
title_full | Machine learning descriptors in materials chemistry used in multiple experimentally validated studies: Oliynyk elemental property dataset |
title_fullStr | Machine learning descriptors in materials chemistry used in multiple experimentally validated studies: Oliynyk elemental property dataset |
title_full_unstemmed | Machine learning descriptors in materials chemistry used in multiple experimentally validated studies: Oliynyk elemental property dataset |
title_short | Machine learning descriptors in materials chemistry used in multiple experimentally validated studies: Oliynyk elemental property dataset |
title_sort | machine learning descriptors in materials chemistry used in multiple experimentally validated studies oliynyk elemental property dataset |
topic | Materials informatics Machine learning Feature engineering Materials chemistry |
url | http://www.sciencedirect.com/science/article/pii/S2352340924001495 |
work_keys_str_mv | AT sangjoonlee machinelearningdescriptorsinmaterialschemistryusedinmultipleexperimentallyvalidatedstudiesoliynykelementalpropertydataset AT cliochen machinelearningdescriptorsinmaterialschemistryusedinmultipleexperimentallyvalidatedstudiesoliynykelementalpropertydataset AT griheydigarcia machinelearningdescriptorsinmaterialschemistryusedinmultipleexperimentallyvalidatedstudiesoliynykelementalpropertydataset AT antonoliynyk machinelearningdescriptorsinmaterialschemistryusedinmultipleexperimentallyvalidatedstudiesoliynykelementalpropertydataset |