Exploiting redundancy in large materials datasets for efficient machine learning with less data

Abstract Extensive efforts to gather materials data have largely overlooked potential data redundancy. In this study, we present evidence of a significant degree of redundancy across multiple large datasets for various material properties, by revealing that up to 95% of data can be safely removed fr...

Full description

Bibliographic Details
Main Authors: Kangming Li, Daniel Persaud, Kamal Choudhary, Brian DeCost, Michael Greenwood, Jason Hattrick-Simpers
Format: Article
Language:English
Published: Nature Portfolio 2023-11-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-023-42992-y