Exploiting redundancy in large materials datasets for efficient machine learning with less data
Abstract Extensive efforts to gather materials data have largely overlooked potential data redundancy. In this study, we present evidence of a significant degree of redundancy across multiple large datasets for various material properties, by revealing that up to 95% of data can be safely removed fr...
Main Authors: | Kangming Li, Daniel Persaud, Kamal Choudhary, Brian DeCost, Michael Greenwood, Jason Hattrick-Simpers |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2023-11-01
|
Series: | Nature Communications |
Online Access: | https://doi.org/10.1038/s41467-023-42992-y |
Similar Items
-
Publisher Correction: Exploiting redundancy in large materials datasets for efficient machine learning with less data
by: Kangming Li, et al.
Published: (2024-01-01) -
A critical examination of robustness and generalizability of machine learning prediction of materials properties
by: Kangming Li, et al.
Published: (2023-04-01) -
Probing out-of-distribution generalization in machine learning for materials
by: Kangming Li, et al.
Published: (2025-01-01) -
Why big data and compute are not necessarily the path to big materials science
by: Naohiro Fujinuma, et al.
Published: (2022-08-01) -
Author Correction: Atomistic Line Graph Neural Network for improved materials property predictions
by: Kamal Choudhary, et al.
Published: (2022-10-01)