Machine learning descriptors in materials chemistry used in multiple experimentally validated studies: Oliynyk elemental property dataset

Materials informatics employs data-driven approaches for analysis and discovery of materials. Features also referred to as descriptors are essential in generating reliable and accurate machine-learning models. While general data can be obtained through public and commercial sources, features must be...

Full description

Bibliographic Details
Main Authors: Sangjoon Lee, Clio Chen, Griheydi Garcia, Anton Oliynyk
Format: Article
Language:English
Published: Elsevier 2024-04-01
Series:Data in Brief
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352340924001495
_version_ 1797256378142687232
author Sangjoon Lee
Clio Chen
Griheydi Garcia
Anton Oliynyk
author_facet Sangjoon Lee
Clio Chen
Griheydi Garcia
Anton Oliynyk
author_sort Sangjoon Lee
collection DOAJ
description Materials informatics employs data-driven approaches for analysis and discovery of materials. Features also referred to as descriptors are essential in generating reliable and accurate machine-learning models. While general data can be obtained through public and commercial sources, features must be tailored to specific applications. Common featurizers suitable for generic chemical problems may not be effective in features-property mapping in solid-state materials with ML models. Here, we have assembled the Oliynyk property list for compositional feature generation, which performs well on limited datasets (50 to 1000 training data points) in the solid-state materials domain. The dataset contains 98 elemental features for atomic numbers from 1 to 92, including thermodynamic properties, electronic structure data, size, electronegativity, and bulk properties such as melting point, density, and conductivity. The dataset has been utilized peer-reviewed publications in predicting material hardness, classification, discovery of novel Heusler compounds, band gap prediction, and determining the site preference of atoms using machine learning models including support vector machines, random forests for classification, and support vector regression for regression problems. We have compiled the dataset by parsing data from publicly available databases and literature and further supplementing it by interpolating values with Gaussian process regression.
first_indexed 2024-03-08T00:13:33Z
format Article
id doaj.art-e47a31200e6046a983f0ba322d69713e
institution Directory Open Access Journal
issn 2352-3409
language English
last_indexed 2024-04-24T22:20:47Z
publishDate 2024-04-01
publisher Elsevier
record_format Article
series Data in Brief
spelling doaj.art-e47a31200e6046a983f0ba322d69713e2024-03-20T06:10:01ZengElsevierData in Brief2352-34092024-04-0153110178Machine learning descriptors in materials chemistry used in multiple experimentally validated studies: Oliynyk elemental property datasetSangjoon Lee0Clio Chen1Griheydi Garcia2Anton Oliynyk3Department of Applied Physics and Applied Mathematics, Columbia University, New York, NY 10027, United States; Corresponding authors.Department of Chemistry and Biochemistry, Manhattan College, Riverdale, NY 10471, United StatesDepartment of Chemistry and Biochemistry, Manhattan College, Riverdale, NY 10471, United StatesDepartment of Chemistry, Hunter College, City University of New York, NY 10065, United States; Corresponding authors.Materials informatics employs data-driven approaches for analysis and discovery of materials. Features also referred to as descriptors are essential in generating reliable and accurate machine-learning models. While general data can be obtained through public and commercial sources, features must be tailored to specific applications. Common featurizers suitable for generic chemical problems may not be effective in features-property mapping in solid-state materials with ML models. Here, we have assembled the Oliynyk property list for compositional feature generation, which performs well on limited datasets (50 to 1000 training data points) in the solid-state materials domain. The dataset contains 98 elemental features for atomic numbers from 1 to 92, including thermodynamic properties, electronic structure data, size, electronegativity, and bulk properties such as melting point, density, and conductivity. The dataset has been utilized peer-reviewed publications in predicting material hardness, classification, discovery of novel Heusler compounds, band gap prediction, and determining the site preference of atoms using machine learning models including support vector machines, random forests for classification, and support vector regression for regression problems. We have compiled the dataset by parsing data from publicly available databases and literature and further supplementing it by interpolating values with Gaussian process regression.http://www.sciencedirect.com/science/article/pii/S2352340924001495Materials informaticsMachine learningFeature engineeringMaterials chemistry
spellingShingle Sangjoon Lee
Clio Chen
Griheydi Garcia
Anton Oliynyk
Machine learning descriptors in materials chemistry used in multiple experimentally validated studies: Oliynyk elemental property dataset
Data in Brief
Materials informatics
Machine learning
Feature engineering
Materials chemistry
title Machine learning descriptors in materials chemistry used in multiple experimentally validated studies: Oliynyk elemental property dataset
title_full Machine learning descriptors in materials chemistry used in multiple experimentally validated studies: Oliynyk elemental property dataset
title_fullStr Machine learning descriptors in materials chemistry used in multiple experimentally validated studies: Oliynyk elemental property dataset
title_full_unstemmed Machine learning descriptors in materials chemistry used in multiple experimentally validated studies: Oliynyk elemental property dataset
title_short Machine learning descriptors in materials chemistry used in multiple experimentally validated studies: Oliynyk elemental property dataset
title_sort machine learning descriptors in materials chemistry used in multiple experimentally validated studies oliynyk elemental property dataset
topic Materials informatics
Machine learning
Feature engineering
Materials chemistry
url http://www.sciencedirect.com/science/article/pii/S2352340924001495
work_keys_str_mv AT sangjoonlee machinelearningdescriptorsinmaterialschemistryusedinmultipleexperimentallyvalidatedstudiesoliynykelementalpropertydataset
AT cliochen machinelearningdescriptorsinmaterialschemistryusedinmultipleexperimentallyvalidatedstudiesoliynykelementalpropertydataset
AT griheydigarcia machinelearningdescriptorsinmaterialschemistryusedinmultipleexperimentallyvalidatedstudiesoliynykelementalpropertydataset
AT antonoliynyk machinelearningdescriptorsinmaterialschemistryusedinmultipleexperimentallyvalidatedstudiesoliynykelementalpropertydataset